read_tax usage: read_tax taxid Parses the NCBI taxonomy dump (available from ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz) and, for a given taxid, saves a list of all child taxids. The script must be run in the same directory as the NCBI taxdump file nodes.dmp. The script will produce two files. taxid.all_children contains all taxids found below the given taxid. taxid.all_species_children contains all taxids that have the rank species. Thus: read_tax 12345 will produce 12345.all_children 12345.all_species_children read_tax can optionally read a file of custom nodes, my.nodes, if it is available. This allows the user to define groups that are not present in the NCBI taxonomy. The custom nodes must be in the following format, one node per line: name_of_node|common_name_of_node|parent_node|child nodes|rank|taxid for example, the ecdysozoa are defined thus: ecdysozoa|moulting animals|1|88770 33310 6231 33467 51516 10190|custom|1000001 and running read_tax 1000001 will generate lists of all taxids from subtrees rooted at any of the listed child nodes (88770 33310 6231 33467 51516 10190, each of which correspond to phyla). Taxid lists generated by this script can be used as the input to the read_gbdump.pl script to extract sets of GenBank records for large groups of organisms, particulary in cases where the desired group does not appear in the NCBI taxonomy.