read.fasta {bio3d}R Documentation

Read FASTA formated Sequences

Description

Read aligned or un-aligned sequences from a FASTA format file.

Usage

read.fasta(file, rm.dup = TRUE, to.upper = FALSE, to.dash=TRUE)

Arguments

file input sequence file.
rm.dup logical, if TRUE duplicate sequences (with the same names/ids) will be removed.
to.upper logical, if TRUE residues are forced to uppercase.
to.dash logical, if TRUE ‘.’ gap characters are converted to ‘-’ gap characters.

Value

A list with two components:

ali an alignment character matrix with a row per sequence and a column per equivalent aminoacid/nucleotide.
ids sequence names as identifers.

Note

For a description of FASTA format see: http://www.ebi.ac.uk/help/formats_frame.html. When reading alignment files, the dash ‘-’ is interpreted as the gap character.

Author(s)

Barry Grant

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.

See Also

read.fasta.pdb

Examples

# Read alignment
aln<-read.fasta(system.file("examples/hivp_xray.fa",package="bio3d"))

# Sequence names/ids
aln$id

# Alignment positions 335 to 339
aln$ali[,33:39]

# Sequence d1bg2__
aln$ali["d2a4f_b",]

# Write out positions 33 to 45 only
#aln$ali=aln$ali[,30:45]
#write.fasta(aln, file="eg2.fa")


[Package bio3d version 1.0-6 Index]