get.seq {bio3d} | R Documentation |
Downloads FASTA sequence files from the NR, or SWISSPROT/UNIPROT databases.
get.seq(ids, outfile = "seqs.fasta", db = "nr")
ids |
A character vector of one or more appropriate database codes/identifiers of the files to be downloaded. |
outfile |
A single element character vector specifying the name of the local file to which sequences will be written. |
db |
A single element character vector specifying the database from which sequences are to be obtained. |
This is a basic function to automate sequence file download from the NR and SWISSPROT/UNIPROT databases.
If all files are successfully downloaded a list object with two components is returned:
ali |
an alignment character matrix with a row per sequence and a column per equivalent aminoacid/nucleotide. |
ids |
sequence names as identifiers. |
This is similar to that returned by read.fasta
. However,
if some files were not successfully downloaded then a vector detailing
which ids were not found is returned.
For a description of FASTA format see: http://www.ebi.ac.uk/help/formats_frame.html. When reading alignment files, the dash ‘-’ is interpreted as the gap character.
Barry Grant
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
blast.pdb
, read.fasta
,
read.fasta.pdb
, get.pdb
##pdb <- read.pdb( get.pdb("5p21", URLonly=TRUE) ) ##blast <- blast.pdb( seq.pdb(pdb), database = "swiss" ) ##ids <- plot.blast( blast ) ##get.seq(ids)