ide.filter {bio3d} | R Documentation |
Identify and filter subsets of sequences at a given sequence identity cutoff.
ide.filter(aln = NULL, ide = NULL, cutoff = 0.6, verbose = TRUE)
aln |
sequence alignment list, obtained from
seqaln or read.fasta , or an alignment
character matrix. Not used if ‘ide’ is given. |
ide |
an optional identity matrix obtained from
identity . |
cutoff |
a numeric identity cutoff value ranging between 0 and 1. |
verbose |
logical, if TRUE print details of the clustering process. |
This function performs hierarchical cluster analysis of a given sequence identity matrix ‘ide’, or the identity matrix calculated from a given alignment ‘aln’, to identify sequences that fall below a given identity cutoff value ‘cutoff’.
Returns a list object with components:
ind |
indices of the sequences below the cutoff value. |
tree |
an object of class "hclust" , which describes the
tree produced by the clustering process. |
ide |
a numeric matrix with all pairwise identity values. |
Barry Grant
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
read.fasta
, seqaln
,
identity
, entropy
, consensus
data(kinesin) attach(kinesin) ide.mat <- identity(aln) # Histogram of pairwise identity values par(mfrow=c(2,1)) hist(ide.mat[upper.tri(ide.mat)], breaks=30,xlim=c(0,1), main="Sequence Identity", xlab="Identity") k <- ide.filter(ide=ide.mat, cutoff=0.6) ide.cut <- identity(aln$ali[k$ind,]) hist(ide.cut[upper.tri(ide.cut)], breaks=10, xlim=c(0,1), main="Sequence Identity", xlab="Identity") #plot(k$tree, axes = FALSE, ylab="Sequence Identity") #print(k$ind) # selected