The Molecular Operational Taxonomic Unit or M-OTU

Mark Blaxter, ICAPB, University of Edinburgh

By sequencing an informative segment of DNA it is possible to define "molecular operational taxonomic units" (M-OTU). To be useful, the segment of DNA must be known to be orthologous between species (as paralogues will define gene rather than organismal trees), and the segment must encompass sufficient variability to allow discrimination between M-OTU useful to the research program, or defined in other (for example biological or morphological) ways.

In the M-OTU concept therefore, taxa can be identified through sequence identity. Identity in sequence need not correspond to identity of OTU as measured by other models (biological or morphological): identity in sequence could mean "the same taxon" or "there is insufficient variation to define distinct taxa". The same operational problem plagues other (biological or morphological) methods of defining taxa.

Differences in sequences between specimens can arise in two ways. The first is that the differences are part of the natural, within-OTU variation, or are due to sequencing (methodological) errors. The second is that the differences are related to a distinction between taxa.

It is thus necessary (as with other methods, biological or morphological) to use heuristics for M-OTU distinction based on known error rates in measurement, and perceived levels of difference that distinguish "useful" OTU.

For M-OTU, these measures can be made explicit. For example, from known, accepted taxa within a particular group, the level of between-taxa within-group variation can be measured. Multiple resequencing of a single taxon will yield an observational error rate. The comparison between the between-taxon difference rate and the within-taxon variation and error rates will define the accuracy and specificity of the M-OTU measurement. Given that it is clear from many gene sequences that different higher taxonomic groups can differ markedly in their background and adaptive substitution rates, and that different sized populations might be expected to harbour different levels of within-taxon variation (also dependent on the population's evolutionary history), it may be necessary to define different heuristics for M-OTU designation depending on the higher taxon studied.

The benefit of the M-OTU is that data can be obtained from single specimens, often without compromising parallel or subsequent morphological identification, that morphologically indistinguishable taxa can be separated without the need for live material, and that a single technique is applicable to all taxa. Thus a long and partial training in a particular (sub-) group is not necessary, and recourse can be made to published monographs and keys to understand the known biological properties of the identified M-OTU and their close relatives. All stages/morphs of taxa are amenable to study, as the method depends on genotype, not phenotype.

In addition, the M-OTU data, the sequences, are suited to exhaustive and model-driven phylogenetic analyses to derive independent and testable hypotheses of OTU interrelatedness.