Table Of Contents

Previous topic

mvpa.clfs.smlr

Next topic

mvpa.clfs.svm

This Page

Quick search

mvpa.clfs.stats

Estimator for classifier error distributions.

The comprehensive API documentation for this module, including all technical details, is available in the Epydoc-generated API reference for mvpa.clfs.stats (for developers).

Classes

AdaptiveNormal

class mvpa.clfs.stats.AdaptiveNormal(dist, **kwargs)

Bases: mvpa.clfs.stats.AdaptiveNullDist

Adaptive rdist: params are (0, sqrt(1/nfeatures))

Parameters:
  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the AdaptiveNormal documentation.

Full API documentation of AdaptiveNormal in module mvpa.clfs.stats.

AdaptiveNullDist

class mvpa.clfs.stats.AdaptiveNullDist(dist, **kwargs)

Bases: mvpa.clfs.stats.FixedNullDist

Adaptive distribution which adjusts parameters according to the data

WiP: internal implementation might change

Parameters:
  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
fit(measure, wdata, vdata=None)
Cares about dimensionality of the feature space in measure

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the AdaptiveNullDist documentation.

Full API documentation of AdaptiveNullDist in module mvpa.clfs.stats.

AdaptiveRDist

class mvpa.clfs.stats.AdaptiveRDist(dist, **kwargs)

Bases: mvpa.clfs.stats.AdaptiveNullDist

Adaptive rdist: params are (nfeatures-1, 0, 1)

Parameters:
  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
cdf(x)

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the AdaptiveRDist documentation.

Full API documentation of AdaptiveRDist in module mvpa.clfs.stats.

FixedNullDist

class mvpa.clfs.stats.FixedNullDist(dist, **kwargs)

Bases: mvpa.clfs.stats.NullDist

Proxy/Adaptor class for SciPy distributions.

All distributions from SciPy’s ‘stats’ module can be used with this class.

>>> import numpy as N
>>> from scipy import stats
>>> from mvpa.clfs.stats import FixedNullDist
>>>
>>> dist = FixedNullDist(stats.norm(loc=2, scale=4))
>>> dist.p(2)
0.5
>>>
>>> dist.cdf(N.arange(5))
array([ 0.30853754,  0.40129367,  0.5       ,  0.59870633,  0.69146246])
>>>
>>> dist = FixedNullDist(stats.norm(loc=2, scale=4), tail='right')
>>> dist.p(N.arange(5))
array([ 0.69146246,  0.59870633,  0.5       ,  0.40129367,  0.30853754])
Parameters:
  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
cdf(x)
Return value of the cumulative distribution function at x.
fit(measure, wdata, vdata=None)
Does nothing since the distribution is already fixed.

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the FixedNullDist documentation.

Full API documentation of FixedNullDist in module mvpa.clfs.stats.

MCNullDist

class mvpa.clfs.stats.MCNullDist(dist_class=<class 'mvpa.clfs.stats.Nonparametric'>, permutations=100, **kwargs)

Bases: mvpa.clfs.stats.NullDist

Null-hypothesis distribution is estimated from randomly permuted data labels.

The distribution is estimated by calling fit() with an appropriate DatasetMeasure or TransferError instance and a training and a validation dataset (in case of a TransferError). For a customizable amount of cycles the training data labels are permuted and the corresponding measure computed. In case of a TransferError this is the error when predicting the correct labels of the validation dataset.

The distribution can be queried using the cdf() method, which can be configured to report probabilities/frequencies from left or right tail, i.e. fraction of the distribution that is lower or larger than some critical value.

This class also supports FeaturewiseDatasetMeasure. In that case cdf() returns an array of featurewise probabilities/frequencies.

Initialize Monte-Carlo Permutation Null-hypothesis testing

Parameters:
  • dist_class (class) – This can be any class which provides parameters estimate using fit() method to initialize the instance, and provides cdf(x) method for estimating value of x in CDF. All distributions from SciPy’s ‘stats’ module can be used.
  • permutations (int) – This many permutations of label will be performed to determine the distribution under the null hypothesis.
cdf(x)
Return value of the cumulative distribution function at x.
clean()

Clean stored distributions

Storing all of the distributions might be too expensive (e.g. in case of Nonparametric), and the scope of the object might be too broad to wait for it to be destroyed. Clean would bind dist_samples to empty list to let gc revoke the memory.

fit(measure, wdata, vdata=None)

Fit the distribution by performing multiple cycles which repeatedly permuted labels in the training dataset.

Parameters:
  • measure ((Featurewise)`DatasetMeasure` | TransferError) – TransferError instance used to compute all errors.
  • wdata (Dataset which gets permuted and used to compute the) – measure/transfer error multiple times.
  • vdata (Dataset used for validation.) – If provided measure is assumed to be a TransferError and working and validation dataset are passed onto it.

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the MCNullDist documentation.

Full API documentation of MCNullDist in module mvpa.clfs.stats.

Nonparametric

class mvpa.clfs.stats.Nonparametric(dist_samples)

Bases: object

Non-parametric 1d distribution – derives cdf based on stored values.

Introduced to complement parametric distributions present in scipy.stats.

cdf(x)
Returns the cdf value at x.
fit

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the Nonparametric documentation.

Full API documentation of Nonparametric in module mvpa.clfs.stats.

NullDist

class mvpa.clfs.stats.NullDist(tail='both', **kwargs)

Bases: mvpa.misc.state.ClassWithCollections

Base class for null-hypothesis testing.

Cheap initialization.

Parameters:
  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.
cdf(x)
Implementations return the value of the cumulative distribution function (left or right tail dpending on the setting).
fit(measure, wdata, vdata=None)
Implement to fit the distribution to the data.
p(x, **kwargs)

Returns the p-value for values of x. Returned values are determined left, right, or from any tail depending on the constructor setting.

In case a FeaturewiseDatasetMeasure was used to estimate the distribution the method returns an array. In that case x can be a scalar value or an array of a matching shape.

tail

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the NullDist documentation.

Full API documentation of NullDist in module mvpa.clfs.stats.

Functions

mvpa.clfs.stats.autoNullDist(dist)

Cheater for human beings – wraps dist if needed with some NullDist

tail and other arguments are assumed to be default as in NullDist/MCNullDist

See also

Full API documentation of autoNullDist() in module mvpa.clfs.stats.

mvpa.clfs.stats.nanmean(x, axis=0)

Compute the mean over the given axis ignoring nans.

Parameters:
  • x (ndarray) – input array
  • axis (int) – axis along which the mean is computed.
Results:
m : float

the mean.

See also

Full API documentation of nanmean() in module mvpa.clfs.stats.