TaxMan
A Taxonomy Database Manager
Current version 1.1 released on 25 September 2006
TaxMan is a software package that uses freely available bioinformatics tools to allow rapid assembly, storage and analysis of large
aligned DNA and protein sequence datasets for user-defined sets of species and genes. Sequence data are obtained from GenBank
format files on the basis of annotation and / or sequence similarity. Consensus sequences are built and aligned automatically, and
aligned sequences are stored in a database. By using the stored aligned sequences, large concatenated protein and DNA multiple
sequence alignments can be rapidly generated for subsets of the dataset. Trees resulting from phylogenetic analysis can also be
stored, and compared with a reference taxonomy.
Recently, there has been much interest in the use of large, concatenated multiple sequence alignments ('supermatrices') for
phylogenetic analysis. Such datasets have been shown to be useful in resolving difficult phylogenetic questions with a high degree
of confidence. By combining the phylogenetic signal from multiple genes, clades can be recovered that are not recovered under
analysis of any of the individual genes. Additionally, genes evolving at different rates may offer resolution at different
phylogenetic levels. Large-scale phylogenetic analyses of the type described above place a heavy burden of sequence acquisition,
dataset assembly and dataset storage on the researcher. Sequences corresponding to the genes of interest must be obtained from public
databases and orthology assigned. Where multiple sequences are available for a given gene in a species (as is often the case with EST
datasets, for example) a consensus sequence must be derived. The sequences for each gene must then be aligned before being added to
a concatenated alignment file, which may also contain commands necessary to partition the data. TaxMan aims to reduce this burden by
automatically assembling and storing large aligned sequence datasets, as well as storing metadata and trees resulting from phylogenetic
analysis. Because of the high level of automation offered by TaxMan, datasets can be rebuilt rapidly to include new sequence data.