Table Of Contents

Previous topic

clfs.blr

Next topic

clfs.enet

clfs.distance

Module: clfs.distance

Distance functions to be used in kernels and elsewhere

Functions

mvpa.clfs.distance.absminDistance(a, b)

Returns dinstance max(|a-b|) XXX There must be better name! XXX Actually, why is it absmin not absmax?

Useful to select a whole cube of a given “radius”

mvpa.clfs.distance.cartesianDistance(a, b)

Return Cartesian distance between a and b

mvpa.clfs.distance.mahalanobisDistance(x, y=None, w=None)

Calculate Mahalanobis distance of the pairs of points.

Parameters:
  • x – first list of points. Rows are samples, columns are features.
  • y – second list of points (optional)
  • w (N.ndarray) – optional inverse covariance matrix between the points. It is computed if not given

Inverse covariance matrix can be calculated with the following

w = N.linalg.solve(N.cov(x.T), N.identity(x.shape[1]))

or

w = N.linalg.inv(N.cov(x.T))
mvpa.clfs.distance.manhattenDistance(a, b)

Return Manhatten distance between a and b

mvpa.clfs.distance.oneMinusCorrelation(X, Y)

Return one minus the correlation matrix between the rows of two matrices.

This functions computes a matrix of correlations between all pairs of rows of two matrices. Unlike NumPy’s corrcoef() this function will only considers pairs across matrices and not within, e.g. both elements of a pair never have the same source matrix as origin.

Both arrays need to have the same number of columns.

Parameters:
  • X (2D-array) –
  • Y (2D-array) –

Example:

>>> X = N.random.rand(20,80)
>>> Y = N.random.rand(5,80)
>>> C = oneMinusCorrelation(X, Y)
>>> print C.shape
(20, 5)
mvpa.clfs.distance.pnorm_w_python(data1, data2=None, weight=None, p=2, heuristic='auto', use_sq_euclidean=True)

Weighted p-norm between two datasets (pure Python implementation)

||x - x’||_w = (sum_{i=1...N} (w_i*|x_i - x’_i|)**p)**(1/p)

Parameters:
  • data1 (N.ndarray) – First dataset
  • data2 (N.ndarray or None) – Optional second dataset
  • weight (N.ndarray or None) – Optional weights per 2nd dimension (features)
  • p – Power
  • heuristic (basestring) – Which heuristic to use: * ‘samples’ – python sweep over 0th dim * ‘features’ – python sweep over 1st dim * ‘auto’ decides automatically. If # of features (shape[1]) is much larger than # of samples (shape[0]) – use ‘samples’, and use ‘features’ otherwise
  • use_sq_euclidean (bool) – Either to use squared_euclidean_distance_matrix for computation if p==2
mvpa.clfs.distance.squared_euclidean_distance(data1, data2=None, weight=None)

Compute weighted euclidean distance matrix between two datasets.

Parameters:
  • data1 (N.ndarray) – first dataset
  • data2 (N.ndarray) – second dataset. If None, compute the euclidean distance between the first dataset versus itself. (Defaults to None)
  • weight (N.ndarray) – vector of weights, each one associated to each dimension of the dataset (Defaults to None)