1.1 What is Biopython?
The Biopython Project is an international association of developers of freely available Python (http://www.python.org) tools for computational molecular biology. The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research.
Basically, we just like to program in python and want to make it as easy as possible to use python for bioinformatics by creating high-quality, reusable modules and scripts.
1.1.1 What can I find in the Biopython package
The main Biopython releases have lots of functionality, including:
-
The ability to parse bioinformatics files into python utilizable data structures, including support for the following formats:
-
Blast output – both from standalone and WWW Blast
- Clustalw
- FASTA
- GenBank
- PubMed and Medline
- Expasy files, like Enzyme, Prodoc and Prosite
- SCOP, including 'dom' and 'lin' files
- Rebase
- UniGene
- SwissProt
- Files in the supported formats can be iterated over record by record or indexed and accessed via a Dictionary interface.
- Code to deal with popular on-line bioinformatics destinations such as:
-
NCBI – Blast, Entrez and PubMed services
- Expasy – Prodoc and Prosite entries
- Interfaces to common bioinformatics programs such as:
-
Standalone Blast from NCBI
- Clustalw alignment program.
- A standard sequence class that deals with sequences, ids on sequences, and sequence features.
- Tools for performing common operations on sequences, such as translation, transcription and weight calculations.
- Code to perform classification of data using k Nearest Neighbors, Naive Bayes or Support Vector Machines.
- Code for dealing with alignments, including a standard way to create and deal with substitution matrices.
- Code making it easy to split up parallelizable tasks into separate processes.
- GUI-based programs to do basic sequence manipulations, translations, BLASTing, etc.
- Extensive documentation and help with using the modules, including this file, on-line wiki documentation, the web site, and the mailing list.
- Integration with other languages, including the Bioperl and Biojava projects, using the BioCorba interface standard (available with the biopython-corba module).
We hope this gives you plenty of reasons to download and start using Biopython!
1.2 Installing Biopython
All of the installation information for Biopython was separated from
this document to make it easier to keep updated. The instructions cover
installation of python, Biopython dependencies and Biopython itself.
It is available in pdf
(http://www.biopython.org/docs/install/Installation.pdf)
and html formats
(http://www.biopython.org/docs/install/Installation.html).
-
I looked in a directory for code, but I couldn't seem to find the code that does something. Where's it hidden?
One thing to know is that we put code in __init__.py
files. If you are not used to looking for code in this file this can be confusing. The reason we do this is to make the imports easier for users. For instance, instead of having to do a “repetitive” import like from Bio.GenBank import GenBank
, you can just import like from Bio import GenBank
.
- What happened to the
br_regrtest.py
regression tests?
We updated the regression testing framework to use PyUnit, and also to fix newline problems. br_regrtest.py
is still there, but almost all of its functionality has been moved (well, copy and pasted) to run_tests.py
.
- Why do some of the tests fail when running the regression tests with output like:
Writing: '\012', expected: '\015'
This shouldn't happen any more! We updated the regression testing suite so that it uses PyUnit and we hopefully have fixed newline problems. Please let us know if any tests fail.