Purify.
Return the matrix x without rows and cols containing respectively more than th0 * x.shape[1] and th1 * x.shape[0] NaNs.
Output
- xout, v0, v1
where v0 are the valid index at dimension 0 and v1 are the valid index at dimension 1
Example:
>>> import numpy as np
>>> import mlpy
>>> x = np.array([[1, 4, 4 ],
... [2, 9, np.NaN],
... [2, 5, 8 ],
... [8, np.NaN, np.NaN],
... [np.NaN, 4, 4 ]])
>>> y = np.array([1, -1, 1, -1, -1])
>>> x, v0, v1 = mlpy.purify(x, 0.4, 0.4)
>>> x
array([[ 1., 4., 4.],
[ 2., 9., NaN],
[ 2., 5., 8.],
[ NaN, 4., 4.]])
>>> v0
array([0, 1, 2, 4])
>>> v1
array([0, 1, 2])
New in version 2.0.4.
Knn imputing.
Input
- x - [2D numpy array float] (#sample x #feature) data to impute
- y - [1D numpy array integer/float] labels
- k - [integer] number of nearest neighbor
- dist - [string] adopted distance (‘se’ = SQUARED EUCLIDEAN, ‘e’ = EUCLIDEAN)
- method - [string] method to compute the missing values (‘mean’, ‘median’)
- ldep - [bool] label depended
New in version 2.0.4.
>>> import numpy as np
>>> import mlpy
>>> x = np.array([[1, 4, 4 ],
... [2, 9, np.NaN],
... [2, 5, 8 ],
... [8, np.NaN, np.NaN],
... [np.NaN, 4, 4 ]])
>>> y = np.array([1, -1, 1, -1, -1])
>>> x, v0, v1 = mlpy.purify(x, 0.4, 0.4)
>>> x
array([[ 1., 4., 4.],
[ 2., 9., NaN],
[ 2., 5., 8.],
[ NaN, 4., 4.]])
>>> v0
array([0, 1, 2, 4])
>>> v1
array([0, 1, 2])
>>> y = y[v0]
>>> x = mlpy.knn_imputing(x, y, 2, dist='e', method='mean', ldep=False)
>>> x
array([[ 1. , 4. , 4. ],
[ 2. , 9. , 6. ],
[ 2. , 5. , 8. ],
[ 1.5, 4. , 4. ]])