formatssu DESCRIPTION A small utility script to rewrite the headers in a FASTA format dump of the Silva ribosomal RNA database, optionally removing sequences with unhelpful taxonomic attribution. The script takes as its input a FASTA file of sequences downloaded from the Silva archive repository: http://www.arb-silva.de/no_cache/download/archive/current/Exports/ and rewrites the FASTA sequence headers so that, when the FASTA file is turned into a BLAST database, the headers are in the correct format for use with Taxonerator (see the Taxonerator user guide at http://nematodes.org/bioinformatics/jMOTU/). USAGE Firstly, download the file called ssu-parc.fasta.tgz from the above URL and extract it. This will be the input file when you run formatssu. formatssu.jar is distributed as a .jar file and runs under java 1.6. To use it, extract all the files from this .zip into a new folder, and run java -jar formatssu.jar to get usage instructions. The program takes two arguments - the input filename and the output filename. Use the -r switch if you want to remove sequences with the annotation 'uncultured'. If this option is used, the program will write the headers of the excluded sequences to a logfile log.txt. e.g. java -jar formatssu.jar -i /home/myname/ssu-parc.fasta -o /home/myname/ssu_formatted.fasta -r Remember that in order to use the output file with Taxonerator you must create a BLAST database (see the Taxonerator user guide for details).