Table Of Contents

Previous topic

misc.cmdline

Next topic

misc.errorfx

misc.data_generators

Module: misc.data_generators

Miscelaneous data generators for unittests and demos

Functions

mvpa.misc.data_generators.chirpLinear(n_instances, n_features=4, n_nonbogus_features=2, data_noise=0.4, noise=0.1)

Generates simple dataset for linear regressions

Generates chirp signal, populates n_nonbogus_features out of n_features with it with different noise level and then provides signal itself with additional noise as labels

mvpa.misc.data_generators.dumbFeatureBinaryDataset()

Very simple binary (2 labels) dataset

mvpa.misc.data_generators.dumbFeatureDataset()

Create a very simple dataset with 2 features and 3 labels

mvpa.misc.data_generators.getMVPattern(s2n)

Simple multivariate dataset

mvpa.misc.data_generators.linear_awgn(size=10, intercept=0.0, slope=0.4, noise_std=0.01, flat=False)

Generate a dataset from a linear function with AWGN (Added White Gaussian Noise).

It can be multidimensional if ‘slope’ is a vector. If flat is True (in 1 dimesion) generate equally spaces samples instead of random ones. This is useful for the test phase.

mvpa.misc.data_generators.multipleChunks(func, n_chunks, *args, **kwargs)

Replicate datasets multiple times raising different chunks

Given some randomized (noisy) generator of a dataset with a single chunk call generator multiple times and place results into a distinct chunks

mvpa.misc.data_generators.noisy_2d_fx(size_per_fx, dfx, sfx, center, noise_std=1)
mvpa.misc.data_generators.normalFeatureDataset(perlabel=50, nlabels=2, nfeatures=4, nchunks=5, means=None, nonbogus_features=None, snr=3.0)

Generate a univariate dataset with normal noise and specified means.

Keywords :
perlabel : int

Number of samples per each label

nlabels : int

Number of labels in the dataset

nfeatures : int

Total number of features (including bogus features which carry no label-related signal)

nchunks : int

Number of chunks (perlabel should be multiple of nchunks)

means : None or list of float or ndarray

Specified means for each of features among nfeatures.

nonbogus_features : None or list of int

Indexes of non-bogus features (1 per label)

snr : float

Signal-to-noise ration assuming that signal has std 1.0 so we just divide random normal noise by snr

Probably it is a generalization of pureMultivariateSignal where means=[ [0,1], [1,0] ]

Specify either means or nonbogus_features so means get assigned accordingly

mvpa.misc.data_generators.normalFeatureDataset__(dataset=None, labels=None, nchunks=None, perlabel=50, activation_probability_steps=1, randomseed=None, randomvoxels=False)

NOT FINISHED

mvpa.misc.data_generators.pureMultivariateSignal(patterns, signal2noise=1.5, chunks=None)

Create a 2d dataset with a clear multivariate signal, but no univariate information.

%%%%%%%%%
% O % X %
%%%%%%%%%
% X % O %
%%%%%%%%%
mvpa.misc.data_generators.sinModulated(n_instances, n_features, flat=False, noise=0.4)

Generate a (quite) complex multidimensional non-linear dataset

Used for regression testing. In the data label is a sin of a x^2 + uniform noise

mvpa.misc.data_generators.wr1996(size=200)

Generate ‘6d robot arm’ dataset (Williams and Rasmussen 1996)

Was originally created in order to test the correctness of the implementation of kernel ARD. For full details see: http://www.gaussianprocess.org/gpml/code/matlab/doc/regression.html#ard

x_1 picked randomly in [-1.932, -0.453] x_2 picked randomly in [0.534, 3.142] r_1 = 2.0 r_2 = 1.3 f(x_1,x_2) = r_1 cos (x_1) + r_2 cos(x_1 + x_2) + N(0,0.0025) etc.

Expected relevances: ell_1 1.804377 ell_2 1.963956 ell_3 8.884361 ell_4 34.417657 ell_5 1081.610451 ell_6 375.445823 sigma_f 2.379139 sigma_n 0.050835