CLOBB3
Clustering sequences on the basis of BLAST similarity
Current version 1.0
The program takes a set of DNA sequences and clusters them into groups which putatively derive from the same gene.
In order to operate, the user must have BLASTALL in their path. The output is a blastable fasta file named
<cluster_id>EST, where cluster_id is given by the user, which contails a list of sequences with identifiers
<cluster_id>00001 to <cluster_id>99999. The program BLASTS each sequence in trun against the growing database of
clusters then examines the BLAST report for High Scoreing Pairs (HSPs) which demonstrate near identical regions of
sequence similarity (>95% identity over >30 bases, stringency can be contorlled by the user). The query sequence is
then allocated to an existing or new cluster depending on the strength of the HSP(s) and the quality of the match in
the rest of the overlap.
For further deatails please see the User guide.