blast.pdb {bio3d}R Documentation

NCBI BLAST Sequence Search

Description

Run NCBI blastp, on a given sequence, against the PDB, NR and swissprot sequence databases.

Usage

blast.pdb(seq, database = "pdb")

Arguments

seq a single element or multi-element character vector containing the query sequence.
database a single element character vector specifying the database against which to search

Details

This function employs direct HTTP-encoded requests to the NCBI web server to run BLASTP, the protein search algorithm of the BLAST software package.

BLAST, currently the fastest and most popular pairwise sequence comparison algorithm, performs gapped local alignments, through the implementation of a heuristic strategy: it identifies short nearly exact matches or hits, bidirectionally extends non-overlapping hits resulting in ungapped extended hits or high-scoring segment pairs (HSPs), and finally extends the highest scoring HSP in both directions via a gapped alignment (Altschul et al., 1997)

For each pairwise alignment BLAST reports the raw score (or bitscore) along with an E-value that assess the statistical significance of the raw score. Note that unlike the raw score E-values are normalized with respect to both the substitution matrix and the query and database lengths.

Here we also return a corrected normalized score (mlog.evalue) that in our experience is easier to handle and store than conventional E-values. In practice, this score is equivalent to minus the natural log of the E-value. Note that, unlike the raw score, this score is independent of the substitution matrix and and the query and database lengths, and thus is comparable between BLASTP searches.

Value

A list with seven components:

bitscore a numeric vector containing the raw score for each alignment.
evalue a numeric vector containing the E-value of the raw score for each alignment.
mlog.evalue a numeric vector containing minus the natural log of the E-value.
gi.id a character vector containing the gi database identifier of each hit.
pdb.id a character vector containing the PDB database identifier of each hit.
hit.tbl a character matrix summarizing BLAST results for each reported hit, see below.
raw a data frame summarizing BLAST results, note multiple hits may appear in the same row.

Note

Online access is required to query NCBI blast services.

Author(s)

Barry Grant

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.

‘BLAST’ is the work of Altschul et al.: Altschul, S.F. et al. (1990) J. Mol. Biol. 215, 403–410.

Full details of the ‘BLAST’ algorithm, along with download and installation instructions can be obtained from:
http://www.ncbi.nlm.nih.gov/BLAST/.

See Also

seqaln

Examples

pdb <- read.pdb("http://www.rcsb.org/pdb/files/1l3r.pdb")
seq <- paste(aa321(pdb$atom[pdb$calpha,"resid"]), collapse="")

blast <- blast.pdb(seq)

par(mfcol=c(2,1), mar=c(4, 4, 1, 2))
plot(blast$mlog.evalue, xlab="Hit No", ylab="-log(evalue)")
plot(blast$bitscore, xlab="Hit No", ylab="bitscore")

[Package bio3d version 1.0-6 Index]