Table Of Contents

Previous topic

mvpa.datasets.nifti

Next topic

mvpa.mappers

This Page

Quick search

mvpa.datasets.splitter

Collection of dataset splitters.

The comprehensive API documentation for this module, including all technical details, is available in the Epydoc-generated API reference for mvpa.datasets.splitter (for developers).

Classes

CustomSplitter

class mvpa.datasets.splitter.CustomSplitter(splitrule, **kwargs)

Bases: mvpa.datasets.splitter.Splitter

Split a dataset using an arbitrary custom rule.

The splitter is configured by passing a custom spitting rule (splitrule) to its constructor. Such a rule is basically a sequence of split definitions. Every single element in this sequence results in excatly one split generated by the Splitter. Each element is another sequence for sequences of sample ids for each dataset that shall be generated in the split.

Example:

  • Generate two splits. In the first split the second dataset contains all samples with sample attributes corresponding to either 0, 1 or 2. The first dataset of the first split contains all samples which are not split into the second dataset.

    The second split yields three datasets. The first with all samples corresponding to sample attributes 1 and 2, the second dataset contains only samples with attrbiute 3 and the last dataset contains the samples with attribute 5 and 6.

    CustomSplitter([(None, [0, 1, 2]), ([1,2], [3], [5, 6])])

Cheap init.

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the CustomSplitter documentation.

Full API documentation of CustomSplitter in module mvpa.datasets.splitter.

HalfSplitter

class mvpa.datasets.splitter.HalfSplitter(**kwargs)

Bases: mvpa.datasets.splitter.Splitter

Split a dataset into two halves of the sample attribute.

The splitter yields to splits: first (1st half, 2nd half) and second (2nd half, 1st half).

Cheap init.

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the HalfSplitter documentation.

Full API documentation of HalfSplitter in module mvpa.datasets.splitter.

NFoldSplitter

class mvpa.datasets.splitter.NFoldSplitter(cvtype=1, **kwargs)

Bases: mvpa.datasets.splitter.Splitter

Generic N-fold data splitter.

XXX: This docstring is a shame for such an important class!

Initialize the N-fold splitter.

Parameters:
  • cvtype (Int) – Type of cross-validation: N-(cvtype)
  • kwargs – Additional parameters are passed to the Splitter base class.

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the NFoldSplitter documentation.

Full API documentation of NFoldSplitter in module mvpa.datasets.splitter.

NoneSplitter

class mvpa.datasets.splitter.NoneSplitter(mode='second', **kwargs)

Bases: mvpa.datasets.splitter.Splitter

This is a dataset splitter that does not split. It simply returns the full dataset that it is called with.

The passed dataset is returned as the second element of the 2-tuple. The first element of that tuple will always be ‘None’.

Cheap init – nothing special

Parameters:
  • mode – Either ‘first’ or ‘second’ (default) – which output dataset would actually contain the samples

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the NoneSplitter documentation.

Full API documentation of NoneSplitter in module mvpa.datasets.splitter.

OddEvenSplitter

class mvpa.datasets.splitter.OddEvenSplitter(usevalues=False, **kwargs)

Bases: mvpa.datasets.splitter.Splitter

Split a dataset into odd and even values of the sample attribute.

The splitter yields to splits: first (odd, even) and second (even, odd).

Cheap init.

Parameters:
  • usevalues (Boolean) – If True the values of the attribute used for splitting will be used to determine odd and even samples. If False odd and even chunks are defined by the order of attribute values, i.e. first unique attribute is odd, second is even, despite the corresponding values might indicate the opposite (e.g. in case of [2,3].

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the OddEvenSplitter documentation.

Full API documentation of OddEvenSplitter in module mvpa.datasets.splitter.

Splitter

class mvpa.datasets.splitter.Splitter(nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', attr='chunks')

Bases: object

Base class of dataset splitters.

Each splitter should be initialized with all its necessary parameters. The final splitting is done running the splitter object on a certain Dataset via __call__(). This method has to be implemented like a generator, i.e. it has to return every possible split with a yield() call.

Each split has to be returned as a sequence of Datasets. The properties of the splitted dataset may vary between implementations. It is possible to declare a sequence element as ‘None’.

Please note, that even if there is only one Dataset returned it has to be an element in a sequence and not just the Dataset object!

Initialize splitter base.

Parameters:
  • nperlabel (int or str (or list of them)) – Number of dataset samples per label to be included in each split. Two special strings are recognized: ‘all’ uses all available samples (default) and ‘equal’ uses the maximum number of samples the can be provided by all of the classes. This value might be provided as a sequence whos length matches the number of datasets per split and indicates the configuration for the respective dataset in each split.
  • nrunspersplit (int) – Number of times samples for each split are chosen. This is mostly useful if a subset of the available samples is used in each split and the subset is randomly selected for each run (see the nperlabel argument).
  • permute (bool) – If set to True, the labels of each generated dataset will be permuted on a per-chunk basis.
  • count (None or int) – Desired number of splits to be output. It is limited by the number of splits possible for a given splitter (e.g. OddEvenSplitter can have only up to 2 splits). If None, all splits are output (default).
  • strategy (str) – If count is not None, possible strategies are possible: first First count splits are chosen random Random (without replacement) count splits are chosen equidistant Splits which are equidistant from each other
  • attr (str) – Sample attribute used to determine splits.
setNPerLabel(value)

Set the number of samples per label in the split datasets.

‘equal’ sets sample size to highest possible number of samples that can be provided by each class. ‘all’ uses all available samples (default).

splitDataset(dataset, specs)

Split a dataset by separating the samples where the configured sample attribute matches an element of specs.

Parameters:
  • dataset (Dataset) – This is this source dataset.
  • specs (sequence of sequences) – Contains ids of a sample attribute that shall be split into the another dataset.
Returns:

Tuple of splitted datasets.

splitcfg(dataset)
Return splitcfg for a given dataset
strategy

See also

Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the Splitter documentation.

Full API documentation of Splitter in module mvpa.datasets.splitter.