Imputing

Purify

mlpy.purify(x, th0=0.1, th1=0.1)

Return the matrix x without rows and cols containing respectively more than th0 * x.shape[1] and th1 * x.shape[0] NaNs.

Returns :
(xout, v0, v1) : (2d ndarray, 1d ndarray int, 1d ndarray int)

v0 are the valid index at dimension 0 and v1 are the valid index at dimension 1

Example:

>>> import numpy as np
>>> import mlpy
>>> x = np.array([[1,      4,      4     ],
...               [2,      9,      np.NaN],
...               [2,      5,      8     ],
...               [8,      np.NaN, np.NaN],
...               [np.NaN, 4,      4     ]])
>>> y = np.array([1, -1, 1, -1, -1])
>>> x, v0, v1 = mlpy.purify(x, 0.4, 0.4)
>>> x
array([[  1.,   4.,   4.],
       [  2.,   9.,  NaN],
       [  2.,   5.,   8.],
       [ NaN,   4.,   4.]])
>>> v0
array([0, 1, 2, 4])
>>> v1
array([0, 1, 2])

New in version 2.0.4.

KNN imputing

mlpy.knn_imputing(x, k, dist='e', method='mean', y=None, ldep=False)

Knn imputing

Parameters :
x : 2d ndarray float (samples x feats)

data to impute

k : integer

number of nearest neighbor

dist : string (‘se’ = SQUARED EUCLIDEAN, ‘e’ = EUCLIDEAN)

adopted distance

method : string (‘mean’, ‘median’)

method to compute the missing values

y : 1d ndarray

labels

ldep : bool

label depended (if y != None)

Returns :
xout : 2d ndarray float (samples x feats)

data imputed

>>> import numpy as np
>>> import mlpy
>>> x = np.array([[1,      4,      4     ],
...               [2,      9,      np.NaN],
...               [2,      5,      8     ],
...               [8,      np.NaN, np.NaN],
...               [np.NaN, 4,      4     ]])
>>> y = np.array([1, -1, 1, -1, -1])
>>> x, v0, v1 = mlpy.purify(x, 0.4, 0.4)
>>> x
array([[  1.,   4.,   4.],
       [  2.,   9.,  NaN],
       [  2.,   5.,   8.],
       [ NaN,   4.,   4.]])
>>> v0
array([0, 1, 2, 4])
>>> v1
array([0, 1, 2])
>>> y = y[v0]
>>> x = mlpy.knn_imputing(x, 2, dist='e', method='median')
>>> x
array([[ 1. ,  4. ,  4. ],
       [ 2. ,  9. ,  6. ],
       [ 2. ,  5. ,  8. ],
       [ 1.5,  4. ,  4. ]])

New in version 2.0.4.

Table Of Contents

Previous topic

Wavelet Transform

Next topic

Distance Computations

This Page