Package mvpa :: Package datasets :: Module base :: Class Dataset
[hide private]
[frames] | no frames]

Class Dataset

source code


This class provides a container to store all necessary data to perform MVPA analyses. These are the data samples, as well as the labels associated with these patterns. Additionally samples can be grouped into chunks.

Important: labels assumed to be immutable, ie noone should modify them externally by accessing indexed items, ie something like dataset.labels[1] += "_bad" should not be used. If a label has to be modified, full copy of labels should be obtained, operated on, and assigned back to the dataset, otherwise dataset.uniquelabels would not work. The same applies to any other attribute which has corresponding unique* access property.

Instance Methods [hide private]
 
_resetallunique(self, force=False)
Set to None all unique* attributes of corresponding dictionary
source code
 
_getuniqueattr(self, attrib, dict_)
Provide common facility to return unique attributes
source code
 
_setdataattr(self, attrib, value)
Provide common facility to set attributes
source code
 
_getNSamplesPerAttr(self, attrib='labels')
Returns the number of samples per unique label.
source code
 
_getSampleIdsByAttr(self, values, attrib='labels')
Return indecies of samples given a list of attributes
source code
 
_shapeSamples(self, samples, dtype, copy)
Adapt different kinds of samples
source code
 
_checkData(self)
Checks _data members to have the same # of samples.
source code
 
_expandSampleAttribute(self, attr, attr_name)
If a sample attribute is given as a scalar expand/repeat it to a length matching the number of samples in the dataset.
source code
 
__str__(self)
String summary over the object
source code
 
__repr__(self)
repr(x)
source code
 
summary(self, uniq=True, stats=True, idhash=False)
String summary over the object
source code
 
__iadd__(self, other)
Merge the samples of one Dataset object to another (in-place).
source code
 
__add__(self, other)
Merge the samples two Dataset objects.
source code
 
getRandomSamples(self, nperlabel)
Select a random set of samples.
source code
 
getNSamples(self)
Currently available number of patterns.
source code
 
getNFeatures(self)
Number of features per pattern.
source code
 
setSamplesDType(self, dtype)
Set the data type of the samples array.
source code
 
convertFeatureIds2FeatureMask(self, ids)
Returns a boolean mask with all features in ids selected.
source code
 
convertFeatureMask2FeatureIds(self, mask)
Returns feature ids corresponding to non-zero elements in the mask.
source code
 
idsbychunks(self, x)
attrib
source code
 
idsbylabels(self, x)
attrib
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__

    Creators
 
__init__(self, data=None, dsattr=None, dtype=None, samples=None, labels=None, chunks=None, check_data=True, copy_samples=False, copy_data=True, copy_dsattr=True)
Initialize dataset instance
source code
 
selectFeatures(self, ids, sort=True)
Select a number of features from the current set.
source code
 
applyMapper(self, featuresmapper=None, samplesmapper=None)
Obtain new dataset by applying mappers over features and/or samples.
source code
 
selectSamples(self, mask)
Choose a subset of samples.
source code
    Mutators
 
permuteLabels(self, status, perchunk=True)
Permute the labels.
source code
Class Methods [hide private]
 
_registerAttribute(cls, key, dictname='_data', hasunique=False, default_setter=True)
Register an attribute for any Dataset class.
source code
Class Variables [hide private]
  _uniqueattributes = ['uniquelabels', 'uniquelabels', 'uniquech...
Unique attributes associated with the data
  _registeredattributes = ['samples', 'labels', 'chunks']
Registered attributes (stored in _data)
  _requiredattributes = ['samples', 'labels']
Attributes which have to be provided to __init__, or otherwise no default values would be assumed and construction of the instance would fail
Instance Variables [hide private]
  _data
What makes a dataset.
  _dsattr
Dataset attriibutes.
Properties [hide private]
  idhash
To verify if dataset is in the same state as when smth else was done
  nsamples
Currently available number of patterns.
  nfeatures
Number of features per pattern.
  chunks
chunks
  labels
labels
  samples
samples
  samplesperchunk
attrib
  samplesperlabel
attrib
  uniquechunks
attrib
  uniquelabels
attrib

Inherited from object: __class__

Method Details [hide private]

__init__(self, data=None, dsattr=None, dtype=None, samples=None, labels=None, chunks=None, check_data=True, copy_samples=False, copy_data=True, copy_dsattr=True)
(Constructor)

source code 

Initialize dataset instance

Each of the Keywords arguments overwrites what is/might be already in the data container.

Parameters:
  • data (dict) - Dictionary with an arbitrary number of entries. The value for each key in the dict has to be an ndarray with the same length as the number of rows in the samples array. A special entry in theis dictionary is 'samples', a 2d array (samples x features). A shallow copy is stored in the object.
  • dsattr (dict) - Dictionary of dataset attributes. An arbitrary number of arbitrarily named and typed objects can be stored here. A shallow copy of the dictionary is stored in the object.
  • dtype - If None -- do not change data type if samples is an ndarray. Otherwise convert samples to dtype.
  • samples (ndarray) - a 2d array (samples x features)
  • labels - array or scalar value defining labels for each samples
  • chunks - array or scalar value defining chunks for each sample
Overrides: object.__init__

_getuniqueattr(self, attrib, dict_)

source code 

Provide common facility to return unique attributes

XXX dict_ can be simply replaced now with self._dsattr

_shapeSamples(self, samples, dtype, copy)

source code 

Adapt different kinds of samples

Handle all possible input value for 'samples' and tranform them into a 2d (samples x feature) representation.

_registerAttribute(cls, key, dictname='_data', hasunique=False, default_setter=True)
Class Method

source code 

Register an attribute for any Dataset class.

Creates property assigning getters/setters depending on the availability of corresponding _get, _set functions.

__str__(self)
(Informal representation operator)

source code 
String summary over the object
Overrides: object.__str__

__repr__(self)
(Representation operator)

source code 
repr(x)
Overrides: object.__repr__
(inherited documentation)

summary(self, uniq=True, stats=True, idhash=False)

source code 
String summary over the object
Parameters:
  • uniq (bool) - include summary over data attributes which have unique
  • idhash (bool) - include idhash value for dataset and samples
  • stats (bool) - include some basic statistics (mean, std, var) over dataset samples

__iadd__(self, other)

source code 

Merge the samples of one Dataset object to another (in-place).

No dataset attributes will be merged!

__add__(self, other)
(Addition operator)

source code 

Merge the samples two Dataset objects.

All data of both datasets is copied, concatenated and a new Dataset is returned.

NOTE: This can be a costly operation (both memory and time). If performance is important consider the '+=' operator.

selectFeatures(self, ids, sort=True)

source code 

Select a number of features from the current set.

Returns a new Dataset object with a view of the original samples array (no copying is performed).

WARNING: The order of ids determines the order of features in the returned dataset. This might be useful sometimes, but can also cause major headaches! Order would is verified when running in non-optimized code (if __debug__)

Parameters:
  • ids - iterable container to select ids
  • sort (bool) - if to sort Ids. Order matters and selectFeatures assumes incremental order. If not such, in non-optimized code selectFeatures would verify the order and sort

applyMapper(self, featuresmapper=None, samplesmapper=None)

source code 

Obtain new dataset by applying mappers over features and/or samples.

WARNING: At the moment, handling of samplesmapper is not yet implemented since there were no real use case.

TODO: selectFeatures is pretty much applyMapper(featuresmapper=MaskMapper(...))

Parameters:
  • featuresmapper (Mapper) - Mapper to somehow transform each sample's features
  • samplesmapper (Mapper) - Mapper to transform each feature across samples

selectSamples(self, mask)

source code 

Choose a subset of samples.

Returns a new dataset object containing the selected sample subset.

TODO: yoh, we might need to sort the mask if the mask is a list of ids and is not ordered. Clarify with Michael what is our intent here!

permuteLabels(self, status, perchunk=True)

source code 

Permute the labels.

TODO: rename status into something closer in semantics.

Calling this method with 'status' set to True, the labels are permuted among all samples.

If 'perorigin' is True permutation is limited to samples sharing the same chunk value. Therefore only the association of a certain sample with a label is permuted while keeping the absolute number of occurences of each label value within a certain chunk constant.

If 'status' is False the original labels are restored.

getRandomSamples(self, nperlabel)

source code 

Select a random set of samples.

If 'nperlabel' is an integer value, the specified number of samples is randomly choosen from the group of samples sharing a unique label value ( total number of selected samples: nperlabel x len(uniquelabels).

If 'nperlabel' is a list which's length has to match the number of unique label values. In this case 'nperlabel' specifies the number of samples that shall be selected from the samples with the corresponding label.

The method returns a Dataset object containing the selected samples.

convertFeatureIds2FeatureMask(self, ids)

source code 
Returns a boolean mask with all features in ids selected.
Parameters:
  • ids, list, or, 1d, array - To be selected features ids.
Returns:
ndarray: dtype='bool'
All selected features are set to True; False otherwise.

convertFeatureMask2FeatureIds(self, mask)

source code 
Returns feature ids corresponding to non-zero elements in the mask.
Parameters:
  • mask, 1d, ndarray - Feature mask.
Returns:
ndarray: integer
Ids of non-zero (non-False) mask elements.

Class Variable Details [hide private]

_uniqueattributes

Unique attributes associated with the data
Value:
['uniquelabels', 'uniquelabels', 'uniquechunks', 'uniquechunks']

Property Details [hide private]

idhash

To verify if dataset is in the same state as when smth else was done

Like if classifier was trained on the same dataset as in question

Get Method:
unreachable.idhash(self) - To verify if dataset is in the same state as when smth else was done

nsamples

Currently available number of patterns.
Get Method:
getNSamples(self) - Currently available number of patterns.

nfeatures

Number of features per pattern.
Get Method:
getNFeatures(self) - Number of features per pattern.

chunks

chunks
Get Method:
unreachable(x) - chunks
Set Method:
unreachable(self, x) - attrib

labels

labels
Get Method:
unreachable(x) - labels
Set Method:
unreachable(self, x) - attrib

samples

samples
Get Method:
unreachable(x) - samples
Set Method:
unreachable(self, x) - attrib

samplesperchunk

attrib
Get Method:
unreachable(x) - attrib

samplesperlabel

attrib
Get Method:
unreachable(x) - attrib

uniquechunks

attrib
Get Method:
unreachable(x) - attrib

uniquelabels

attrib
Get Method:
unreachable(x) - attrib