Analysis Scenarios

Searchlight

The term Searchlight refers to an algorithm that runs a scalar DatasetMeasure on all possible spheres of a certain size within a dataset. The measure typically computed is a cross-validated transfer error (see CrossValidatedTransferError). The idea to use a searchlight as a sensitivity analyzer stems from a paper by Kriegeskorte and colleagues [1].

A searchlight analysis is can be easily performed. The following code snippet shows a draft of a complete analysis.

>>> from mvpa.datasets.maskeddataset import MaskedDataset
>>> from mvpa.datasets.splitter import OddEvenSplitter
>>> from mvpa.clfs.svm import LinearCSVMC
>>> from mvpa.clfs.transerror import TransferError
>>> from mvpa.algorithms.cvtranserror import CrossValidatedTransferError
>>> from mvpa.measures.searchlight import Searchlight
>>> from mvpa.misc.data_generators import normalFeatureDataset
>>>
>>> # overcomplicated way to generate an example dataset
>>> ds = normalFeatureDataset(perlabel=10, nlabels=2, nchunks=2,
...                           nfeatures=10, nonbogus_features=[3, 7],
...                           snr=5.0)
>>> dataset = MaskedDataset(samples=ds.samples, labels=ds.labels,
...                         chunks=ds.chunks)
>>>
>>> # setup measure to be computed in each sphere (cross-validated
>>> # generalization error on odd/even splits)
>>> cv = CrossValidatedTransferError(
...          TransferError(LinearCSVMC()),
...          OddEvenSplitter())
>>>
>>> # setup searchlight with 5 mm radius and measure configured above
>>> sl = Searchlight(cv, radius=5)
>>>
>>> # run searchlight on dataset
>>> sl_map = sl(dataset)

If this analysis is done on a fMRI dataset using NiftiDataset the resulting searchlight map (sl_map) can be mapped back into the original dataspace and viewed as a brain overlay. The example section contains a typical application of this algorithm.

[1]Kriegeskorte, N., Goebel, R. & Bandettini, P. (2006). ‘Information-based functional brain mapping.’ Proceedings of the National Academy of Sciences of the United States of America 103, 3863-3868.

Statistical Testing of classifier-based Analyses

It is often desirable to be able to make statements like “Performance is significantly above chance-level”. However, as with other applications of statistics in classifier-based analyses there is the problem that we do not know the distribution of a variable like error or performance under the H0 hypothesis to assign the adored p-values, i.e. the probability of a result given that there is no signal. Even worse, the chance-level or guess probability of a classifier depends on the content of a validation dataset, e.g. balanced or unbalanced number of samples per label and total number of labels).

One approach to deal with this situation is to estimate the NULL distribution. A generic way to do this are permutation tests (aka Monte Carlo). The NULL distribution is estimated by computing some measure multiple times using datasets with no relevant signal in them. These datasets are generated by permuting the labels of all samples in the training dataset each time the measure is computed, and therefore randomizing/removing any possible relevant information.

Given the measures computed using the permuted datasets one can now determine the probability of the empirical measure (i.e. the one computed from the original training dataset) under the no signal condition. This is simply the fraction of measures from the permutation runs that is larger or smaller than the emprical (depending on whether on is looking at performances or errors).

PyMVPA supports such permutations test for transfer errors and all dataset measures. In both cases the object computing the measure or transfer error takes an optional contructor argument null_dist. The value of this argument is an instance of some Distribution estimator. If this is provided the respective TransferError or DatasetMeasure instance will automatically use it to estimate the NULL distribution and store the associated p-values in a state variable named null_prob.

>>> # lazy import
>>> from mvpa.suite import *
>>>
>>> # some example data with signal
>>> train = normalFeatureDataset(perlabel=50, nlabels=2, nfeatures=3,
...                              nonbogus_features=[0,1], snr=3, nchunks=1)
>>>
>>> # define class to estimate NULL distribution of errors
>>> # use left tail of the distribution since we use MeanMatchFx as error
>>> # function and lower is better
>>> # in a real analysis the number of permutations should be MUCH larger
>>> terr = TransferError(clf=SMLR(),
...                      null_dist=MCNullDist(permutations=10,
...                                           tail='left'))
>>>
>>> # compute classifier error on training dataset (should be low :)
>>> err = terr(train, train)
>>> err < 0.4
True
>>> # check that the result is highly significant since we know that the
>>> # data has signal
>>> terr.null_prob < 0.01
True