![]() |
Multivariate Pattern Analysis in Python |
Dataset container
The comprehensive API documentation for this module, including all technical details, is available in the Epydoc-generated API reference for mvpa.datasets.base (for developers).
Bases: object
The Dataset.
This class provides a container to store all necessary data to perform MVPA analyses. These are the data samples, as well as the labels associated with the samples. Additionally, samples can be grouped into chunks.
Groups: |
|
---|
Important: labels assumed to be immutable, i.e. noone should modify them externally by accessing indexed items, ie something like dataset.labels[1] += "_bad" should not be used. If a label has to be modified, full copy of labels should be obtained, operated on, and assigned back to the dataset, otherwise dataset.uniquelabels would not work. The same applies to any other attribute which has corresponding unique* access property.
Initialize dataset instance
There are basically two different way to create a dataset:
Create a new dataset from samples and sample attributes. In this mode a two-dimensional ndarray has to be passed to the samples keyword argument and the corresponding samples attributes are provided via the labels and chunks arguments.
The second way is used internally to perform quick coyping of datasets, e.g. when performing feature selection. In this mode and the two dictionaries (data and dsattr) are required. For performance reasons this mode bypasses most of the sanity check performed by the previous mode, as for internal operations data integrity is assumed.
Parameters: |
|
---|---|
Keywords: |
|
Each of the Keywords arguments overwrites what is/might be already in the data container.
Obtain new dataset by applying mappers over features and/or samples.
While featuresmappers leave the sample attributes information unchanged, as the number of samples in the dataset is invariant, samplesmappers are also applied to the samples attributes themselves!
Applying a featuresmapper will destroy any feature grouping information.
Parameters: |
|
---|
Returns a boolean mask with all features in ids selected.
Parameters: |
|
---|---|
Return type: | ndarray |
Returns: | All selected features are set to True; False otherwise. |
Returns feature ids corresponding to non-zero elements in the mask.
Parameters: |
|
---|---|
Return type: | ndarray |
Returns: | Ids of non-zero (non-False) mask elements. |
Select a random set of samples.
If ‘nperlabel’ is an integer value, the specified number of samples is randomly choosen from the group of samples sharing a unique label value ( total number of selected samples: nperlabel x len(uniquelabels).
If ‘nperlabel’ is a list which’s length has to match the number of unique label values. In this case ‘nperlabel’ specifies the number of samples that shall be selected from the samples with the corresponding label.
The method returns a Dataset object containing the selected samples.
To verify if dataset is in the same state as when smth else was done
Like if classifier was trained on the same dataset as in question
Find samples which are on the boundaries of the blocks
Such samples might need to be removed. By default (with prior=0, post=0) ids of the first samples in a ‘block’ are reported
Parameters: |
|
---|
Universal indexer to obtain indexes of interesting samples/features. See .select() for more information
Return: | tuple of (samples indexes, features indexes). Each item could be also None, if no selection on samples or features was requested (to discriminate between no selected items, and no selections) |
---|
Permute the labels.
TODO: rename status into something closer in semantics.
Parameters: |
|
---|
Universal selector
WARNING: if you need to select duplicate samples (e.g. samples=[5,5]) or order of selected samples of features is important and has to be not ordered (e.g. samples=[3,2,1]), please use selectFeatures or selectSamples functions directly
Mimique plain selectSamples:
dataset.select([1,2,3])
dataset[[1,2,3]]
Mimique plain selectFeatures:
dataset.select(slice(None), [1,2,3])
dataset.select('all', [1,2,3])
dataset[:, [1,2,3]]
Mixed (select features and samples):
dataset.select([1,2,3], [1, 2])
dataset[[1,2,3], [1, 2]]
Select samples matching some attributes:
dataset.select(labels=[1,2], chunks=[2,4])
dataset.select('labels', [1,2], 'chunks', [2,4])
dataset['labels', [1,2], 'chunks', [2,4]]
Mixed – out of first 100 samples, select only those with labels 1 or 2 and belonging to chunks 2 or 4, and select features 2 and 3:
dataset.select(slice(0,100), [2,3], labels=[1,2], chunks=[2,4])
dataset[:100, [2,3], 'labels', [1,2], 'chunks', [2,4]]
Select a number of features from the current set.
Parameters: |
|
---|
Returns a new Dataset object with a view of the original samples array (no copying is performed).
WARNING: The order of ids determines the order of features in the returned dataset. This might be useful sometimes, but can also cause major headaches! Order would is verified when running in non-optimized code (if __debug__)
Choose a subset of samples defined by samples IDs.
Returns a new dataset object containing the selected sample subset.
TODO: yoh, we might need to sort the mask if the mask is a list of ids and is not ordered. Clarify with Michael what is our intent here!
Set labels map.
Checks for the validity of the mapping – values should cover all existing labels in the dataset
String summary over the object
Parameters: |
|
---|
Provide summary statistics over the labels and chunks
Parameters: |
|
---|
Obtain indexes of interesting samples/features. See select() for more information
XXX somewhat obsoletes idsby...
See also
Derived classes might provide additional methods via their base classes. Please refer to the list of base classes (if it exists) at the begining of the Dataset documentation.
Full API documentation of Dataset in module mvpa.datasets.base.
See also
Full API documentation of datasetmethod() in module mvpa.datasets.base.