![]() |
Multivariate Pattern Analysis in Python |
PyMVPA is a Python module intended to ease pattern classification analysis of large datasets. It provides high-level abstraction of typical processing steps and a number of implementations of some popular algorithms. While it is not limited to neuroimaging data it is eminently suited for such datasets. PyMVPA is truly free software (in every respect) and additionally requires nothing but free software to run. Theoretically PyMVPA should run on anything that can run a Python interpreter, although the proof is yet to come.
PyMVPA stands for Multivariate Pattern Analysis in Python.
This manual does not make an attempt to be a comprehensive introduction into machine learning theory or pattern recognition techniques. There is a wealth of high-quality text books about this field available. A very good example is: Pattern Recognition and Machine Learning by Christopher M. Bishop.
A good starting point to learn about the application of machine learning algorithms to (f)MRI data are two recent reviews by Norman et al. [1] and Haynes and Rees [2].
This manual also does not describe every bit and piece of the PyMVPA package. For more information, please have a look at the API documentation, which is a comprehensive and up-to-date description of the whole package.
More examples and usage patterns extending the ones described here can be taken from the examples shipped with the PyMVPA source distribution (doc/examples/) or even the unit test battery, also part of the source distribution (in the tests/ directory).
[1] | Norman, K.A., Polyn, S.M., Detre, G.J. & Haxby, J.V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Science 10, 424–430. |
[2] | Haynes, J.D. & Rees, G. (2007). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7, 523–534. |
The roots of PyMVPA date back to early 2005. At that time it was a C++ library (no Python yet) developed by Michael Hanke and Sebastian Krüger, intended to make it easy to apply artificial neural networks to pattern recognition problems.
During a visit to Princeton University in spring 2005, Michael Hanke was introduced to the MVPA toolbox for Matlab, which had several advantages over a C++ library. Most importantly it was easier to use. While a user of a C++ library is forced to write a significant amount of front-end code, users of the MVPA toolbox could simply load their data and start analyzing it, providing a common interface to functions drawn from a variety of libraries.
However, there are some disadvantages to writing a toolbox in Matlab. While users in general benefit from the powers of Matlab, they are at the same time bound to the goodwill of a commercial company. That this is indeed a problem becomes obvious when one considers the time when the vendor of Matlab was not willing to support the Mac platform. Therefore even if the MVPA toolbox is GPL-licensed it cannot fully benefit from the enormous advantages of the free software development model environment (free as in free speech, not only free beer).
For these reasons, Michael thought that a successor to the C++ library should remain truly free software, remain fully object-oriented (in contrast to the MVPA toolbox), but should be at least as easy to use and extensible as the MVPA toolbox.
After evaluating some possibilities Michael decided that Python is the most promising candidate that was fully capable of fulfilling the intended development goal. Python is a very powerful language that magically combines the possibility to write really fast code and a simplicity that allows one to learn the basic concepts within a few days.
One of the major advantages of Python is the availability of a huge amount of so called modules. Modules can include extensions written in a hardcore language like C (or even FORTRAN) and therefore allow one to incorporate high-performance code without having to leave the Python environment. Additionally some Python modules even provide links to other toolkits. For example RPy allows to use the full functionality of R from inside Python. Even Matlab can be used via some Python modules (see PyMatlab for an example).
After the decision for Python was made, Michael started development with a simple k-Nearest-Neighbour classifier and a cross-validation class. Using the mighty NumPy package made it easy to support data of any dimensionality. Therefore PyMVPA can easily be used with 4d fMRI dataset, but equally well with EEG/MEG data (3d) or even non-neuroimaging datasets.
By September 2007 PyMVPA included support for reading and writing datasets from and to the NIfTI format, kNN and Support Vector Machine classifiers, as well as several analysis algorithms (e.g. searchlight and incremental feature search).
During another visit in Princeton in October 2007 Michael met with Yaroslav Halchenko and Per B. Sederberg. That incident and the following discussions and hacking sessions of Michael and Yaroslav lead to a major refactoring of the PyMVPA codebase, making it much more flexible/extensible, faster and easier than it has ever been before.
Like every other Python module PyMVPA requires at least a basic knowledge of the Python language. However, if one has no prior experience with Python one can benefit from the simplicity of the Python language and acquire this knowledge within a few days by studying some of the many tutorials available on the web.
As PyMVPA is about pattern recognition a basic understanding about machine learning principles is necessary to correctly apply methods with PyMVPA to ensure interpretability of the results.
While most parts of PyMVPA will work without any additional software, some functionality makes use of additional software packages. It is strongly recommended to install these packages as well.
- SciPy: linear algebra, standard distributions
- SciPy is mainly used by the statistical testing and the logistic regression classifier code. However, in the long run SciPy might be used a lot more and could become a required dependency of PyMVPA.
- PyNIfTI: access to NIfTI files
- PyMVPA provides a convenient wrapper for datasets stored in the NIfTI format. If you don’t need that, PyNIfTI is not necessary, but otherwise it makes it really easy to read from and write to NIfTI images.
- Shogun: various classifiers
- PyMVPA currently can make use of several SVM implementations of the Shogun toolbox. It requires the modular python interface of Shogun to be installed. Any version from 0.6 on should work.
- R and RPy: more classifiers
- Currently PyMVPA provides a wrapper around the LARS library.
The following list of software is not required by PyMVPA, but it might make life a lot easier and leads to more efficiency when using PyMVPA.
- IPython: frontend
- If you want to use PyMVPA interactively it is strongly recommend to use IPython. If you think: “Oh no, not another one, I already have to learn about PyMVPA.” please invest a tiny bit of time to watch the Five Minutes with IPython screencasts at showmedo.com, so at least you know what you are missing.
- FSL: preprocessing and analysis of (f)MRI data
- PyMVPA provides some simple bindings to FSL output and filetypes (e.g. EV files and MELODIC output directories). This makes it fairly easy to e.g. use FSL’s implementation of ICA for data reduction and proceed with analyzing the estimated ICs in PyMVPA.
- AFNI: preprocessing and analysis of (f)MRI data
- Similar to FSL, AFNI is a free package for processing (f)MRI data. Though its primary data file format is BRIK files, it has the ability to read and write NIFTI files, which easily integrate with PyMVPA.
- libsvm: fast SVM classifier
- Only the C library is required and none of the Python bindings that are available on the upstream website. PyMVPA provides its own Python wrapper for libsvm which is a fork based on the one included in the libsvm package. Additionally the upstream libsvm distribution causes flooding of the console with a huge amount of debugging messages. Please see the Building from Source section for information on how to build an alternative version that does not have this problem.
The easiest way to obtain PyMVPA is to use pre-built binary packages. Currently the Debian/Ubuntu family is the only environment for which binary packages are available (see below). If you manage to build PyMVPA on Windows or OS X, we would be glad to hear from you.
PyMVPA is available as an official Debian package (python-mvpa; since lenny). The documentation is provided by the optional python-mvpa-doc package.
Backports for the current Debian stable release and binary packages for recent Ubuntu releases are available from a repository at the University of Magdeburg. Please read the package repository instructions to learn about how to obtain them.
If a binary package for your platform and operating system is provided, you do not have to build the packages on your own – use the corresponding pre-build packages instead. However, if there are no binary packages for your system you can easily build PyMVPA on your own. Any recent linux distribution should be capable of doing it. Additionally, we are aware of successful builds on Mac OSX.
The first step is obtaining the sources. The source code tarballs of all PyMVPA releases are available from the PyMVPA project website. Alternatively, one can also download a tarball of the latest development snapshot (i.e. the current state of the master branch of the PyMVPA source code repository).
If you want to have access to both, the full PyMVPA history and the latest development code, you can use the PyMVPA Git repository, which is publicly available. To view the repository, please point your web browser to gitweb:
http://git.debian.org/?p=pkg-exppsy/pymvpa.git
The gitweb browser also allows to download arbitrary development snapshots of PyMVPA. For a full clone (aka checkout) of the PyMVPA repository simply do:
git clone git://git.debian.org/git/pkg-exppsy/pymvpa.git
After a short while you will have a pymvpa directory below your current working directory, that contains the PyMVPA repository.
To build PyMVPA from source simply enter the root of the source tree (obtained by either extracting the source package or cloning the repository) and run:
python setup.py build_ext
If you are using a Python version older than 2.5, you need to have python-ctypes (>= 1.0.1) installed to be able to do this.
Now, you are ready to install the package. Do this by invoking:
python setup.py install
Most likely you need superuser privileges for this step. If you want to install in a non-standard location, please take a look at the –prefix option. You also might want to consider –optimize.
Now you should be ready to use PyMVPA on your system.
From the 0.2 release of PyMVPA on, the libsvm classifier extension is not build by default anymore. However, it is still shipped with PyMVPA and can be enabled at build time. To be able to do this you need to have SWIG and the development files of libsvm (headers and library) installed on your system. Depending on where you installed them, it might be necessary to specify the full path to them with the –include-dirs, –library-dirs and –swig options.
PyMVPA needs a patched libsvm version, as the original distribution generates a huge amount of debugging messages and therefore makes the console and PyMVPA output almost unusable. Debian (since lenny: 2.84.0-1) and Ubuntu (since gutsy) already include the patched version. For all other systems it is easy to build patched libsvm (see Building patched libsvm from Source).
The command to build all extentions including the libsvm wrapper is:
PYMVPA_LIBSVM=1 python setup.py build_ext --swig-opts="-c++ -noproxy"
The installation procedure is equivalent to the a build setup without libsvm.
First get the patched sources from:
http://packages.debian.org/source/sid/libsvm
Download the diff.gz and the orig.tar.gz files offered at the bottom of the page. Once downloaded extract the tar.gz file and patch it. The following example refers to libsvm version 2.85.0, please adjust the filenames and versions if you use a later version:
tar xvzf libsvm_2.85.0.orig.tar.gz cd libsvm-2.85 zcat ../libsvm_2.85.0-1.diff.gz | patch -p1
If zcat does not work for you (which might happen on Mac OSX), simply decompress the diff manually and do:
patch -p1 < ../libsvm_2.85.0-1.diff
instead to patch the sources. If this is done build the library and install it:
make libsvm.so.2.85.0 DESTDIR=/usr/local make install
Set DESTDIR to your prefered installation path. For those running Mac OSX, there is also a Makefile.osx.
Alternatively, if you are doing development in PyMVPA or if you simply do not want (or do not have sufficient permissions to do so) to install PyMVPA system wide, you can simply call make (same make build) in the top-level directory of the source tree to build PyMVPA. Then extend or define your environment variable PYTHONPATH to point to the root of PyMVPA sources (i.e. where you invoked all previous commands from):
export PYTHONPATH=$PWD
However, please note that this procedure also always builds the libsvm extension and therefore also required the patched libsvm version to be available.
If there are no binary packages for your operating system or platform yet, you need to build from source. Please refer to Building from Source for more information.
Otherwise just install the binary packages as you would do with any other package. For example on Debian or Ubuntu simply do:
sudo aptitude install python-mvpa
The PyMVPA toolbox was first presented with a poster at annual meeting of the German Society for Psychophysiology and its Application in Magdeburg, 2008. This is currently the prefered way to cite PyMVPA. However, we submitted a paper introducing the toolbox, which should become replace the poster soon.
(needs some more words, for now just a list)
- NumPy, SciPy
- libsvm
- Shogun
- IPython
- Debian (for hosting, environment, ...)
- FOSS community
- Credits to individual labs if they officially donate time ;-)