![]() |
Multivariate Pattern Analysis in Python |
This example runs a number of classifiers on a simple 2D dataset and plots the decision surface of each classifier.
First compose some sample data – no PyMVPA involved.
>>> import numpy as N
>>>
>>> # set up the labeled data
>>> # two skewed 2-D distributions
>>> num_dat = 200
>>> dist = 4
>>> feat_pos=N.random.randn(2, num_dat)
>>> feat_pos[0, :] *= 2.
>>> feat_pos[1, :] *= .5
>>> feat_pos[0, :] += dist
>>> feat_neg=N.random.randn(2, num_dat)
>>> feat_neg[0, :] *= .5
>>> feat_neg[1, :] *= 2.
>>> feat_neg[0, :] -= dist
>>>
>>> # set up the testing features
>>> x1 = N.linspace(-10, 10, 100)
>>> x2 = N.linspace(-10, 10, 100)
>>> x,y = N.meshgrid(x1, x2);
>>> feat_test = N.array((N.ravel(x), N.ravel(y)))
>>>
Now load PyMVPA and convert the data into a proper Dataset.
>>> from mvpa.suite import *
>>>
>>> # create the pymvpa dataset from the labeled features
>>> patternsPos = Dataset(samples=feat_pos.T, labels=1)
>>> patternsNeg = Dataset(samples=feat_neg.T, labels=0)
>>> patterns = patternsPos + patternsNeg
>>>
This demo utilizes a number of classifiers. The instantiation of a classifier involves almost no runtime costs, so it is easily possible compile a long list, if necessary.
>>> # set up classifiers to try out
>>> clfs = {'Ridge Regression': RidgeReg(),
>>> 'Linear SVM': LinearNuSVMC(probability=1,
>>> enable_states=['probabilities']),
>>> 'RBF SVM': RbfNuSVMC(probability=1,
>>> enable_states=['probabilities']),
>>> 'SMLR': SMLR(lm=0.01),
>>> 'Logistic Regression': PLR(criterion=0.00001),
>>> 'k-Nearest-Neighbour': kNN(k=10)}
>>>
Now we are ready to run the classifiers. The folowing loop trains and queries each classifier to finally generate a nice plot showing the decision surface of each individual classifier.
>>> # loop over classifiers and show how they do
>>> fig = 0
>>>
>>> # make a new figure
>>> P.figure(figsize=(8,12))
>>> for c in clfs:
>>> # tell which one we are doing
>>> print "Running %s classifier..." % (c)
>>>
>>> # make a new subplot for each classifier
>>> fig += 1
>>> P.subplot(3,2,fig)
>>>
>>> # plot the training points
>>> P.plot(feat_pos[0, :], feat_pos[1, :], "r.")
>>> P.plot(feat_neg[0, :], feat_neg[1, :], "b.")
>>>
>>> # select the clasifier
>>> clf = clfs[c]
>>>
>>> # enable saving of the values used for the prediction
>>> clf.states.enable('values')
>>>
>>> # train with the known points
>>> clf.train(patterns)
>>>
>>> # run the predictions on the test values
>>> pre = clf.predict(feat_test.T)
>>>
>>> # if ridge, use the prediction, otherwise use the values
>>> if c == 'Ridge Regression' or c == 'k-Nearest-Neighbour':
>>> # use the prediction
>>> res = N.asarray(pre)
>>> elif c == 'Logistic Regression':
>>> # get out the values used for the prediction
>>> res = N.asarray(clf.values)
>>> elif c == 'SMLR':
>>> res = N.asarray(clf.values[:, 1])
>>> else:
>>> # get the probabilities from the svm
>>> res = N.asarray([(q[1][1] - q[1][0] + 1) / 2
>>> for q in clf.probabilities])
>>>
>>> # reshape the results
>>> z = N.asarray(res).reshape((100, 100))
>>>
>>> # plot the predictions
>>> P.pcolor(x, y, z, shading='interp')
>>> P.clim(0, 1)
>>> P.colorbar()
>>> P.contour(x, y, z, linewidths=1, colors='black', hold=True)
>>>
>>> # add the title
>>> P.title(c)
>>>
>>> if cfg.getboolean('examples', 'interactive', True):
>>> # show all the cool figures
>>> P.show()
See also
The full source code of this example is included in the PyMVPA source distribution (doc/examples/pylab_2d.py).