|
|
trace2dbEST FAQs
Frequently asked questions (and answers)
What is an EST?
What is dbEST?
Why does EST data need to conform to dbEST standards?
Have you developed other EST processing software?
Why should I use trace2dbest?
Why do I have to use a controlled sequence naming scheme?
What is the naming scheme that I need to use?
My trace files use a different naming scheme, what should I do?
How can I further process my ESTs?
Do I need other software to make this EST software work?
Who developed this software?
How do I download and install this software?
How can I suggest improvements or bugs?
How can I cite this software?
I get the error "phredpar.dat not found" - why?
I don't think trace2dbest is removing all vector sequence - why?
What Should I do if I get this error "fatal error: unable to match primer ID string" in trace2dbest?
trace2dbest says it can't save output files because file already exists - how?
What is an EST?
EST stands for "expressed sequence tag". ESTs are single-pass sequencing reads from cDNA clones. They are a very useful tool for gene discovery, and form the mainstay of many genomic efforts. For a description of ESTs and the philosophy and biology behind their use see:
1. Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH, Kirkness EF, Weinstock KG, Gocayne JD, White O, et al.: Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 1995, 377 (Supplement):3-174.
2. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, et al.: Complementary DNA sequencing: expressed sequence tags and the human genome project. Science 1991, 252:1651-1656.
3. McCombie WR, Adams MD, Kelley JM, FitzGerald MG, Utterback TR, Khan M, Dubnick M, Kerlavage AR, Venter JC, Fields C: Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nature Genetics 1992, 1:124-131.
What is dbEST?
dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. A brief account of the history of human ESTs in GenBank is available (Trends Biochem. Sci. 20:295-6;1995). Also, consult the special "Genome Directory" issue of Nature (vol. 377, issue 6547S, 28 September 1995).
Why does EST data need to conform to dbEST standards?
dbEST is the central public repository of EST sequences. Most journals require that ESTs are deposited in dbEST before or on publication.
Have you developed other EST processing software?
Yes, we have developed a an EST processing pipeline which consists of several programs. trace2dbest processes EST sequencing trace files into files ready for submission to dbEST. PartiGene takes sequences (usually ESTs) and creates a database of a non-redundant set of sequence objects (putative genes) which we term a partial genome. A web interface to the genome can be created using wwwPartiGene, SimiTri allows visualisation of the relative BLAST-based similarity scoresand the putative genes annotated using prot4EST and Annot8r_blast2GO.
Why should I use trace2dbest?
trace2dbest simplifies the process of getting your sequences from the sequencer to the database. With only a few sequences, its possible to do the job by hand relying on manual editing, and individually-tailored responses to possible errors and other issues. When processing a lot of sequences, for example any project with more than 48 individual trace files, it is easier to let a computer do the work. The high-throughput genome sequencing centres have developed a suite of software tools that are simply adapted for use in a low- or medium-throughput setting. What we have done is bundle these together into one, called trace2dbest. We hope that using trace2dbest will be easy and painless, and that it makes the process of generating and using ESTs exciting and rewarding.
Why do I have to use a controlled sequence naming scheme?
trace2dbest is useful in isolation, but is designed to be used in an integrated set of programs (called PartiGene) that can take EST sequence traces through a series of informatic analyses to produce a partial genome a database of analysed, annotated sequences. For this suite to function, it needs to have a consistent naming scheme for all the sequences so that the programs can perform the proper analyses. This consistency allows the software to process files efficiently, extracting information from the file name rather than having to be told by a user what to do.
What is the naming scheme that I need to use?
The naming scheme that trace2dbest uses is one that has been developed for use in many projects, but especially the NERC Environmental Genomics programme. In essence it is a series of tags separated by the underscore ( _ ) character. The first tag is two characters, and indicates the species (or major project identifier), while the second tag indicates the library from which the clone sequenced was derived. The third tag indicates the address of the clone in terms of mictotitre plate number and row/column, and the fourth tag indicates the primer used for sequence generation. Thus Am_AW1_03F03 would indicate a sequence trace from a species Am (say Apis mellifera, the honey bee), from library AW1 (say adult worker library number 1), from microtitre plate coordinate 03F03 (the software will interpret this further as plate 03, row F, column 03). Using the naming scheme allows the software and the user to usefully interpret and summarise data in terms of species, library or plate.
My trace files use a different naming scheme, what should I do?
Packaged with trace2dbest is a small program called rename_file.pl that can be used to rename your trace files to fit the naming scheme. rename_file.pl is a script that matches a pattern you supply, and replaces it with the correct one. It can also transform serially numbered files into files numbered as if from a 96 well plate (so that 001 becomes A01 and 096 becomes H12).
How can I further process my ESTs?
After trace2dbest the next step for EST processing is PartiGene. PartiGene is a menu-driven multi-step pipeline which takes sequences (usually ESTs) and creates a non-redundant set of sequence objects (putative genes) which we term a partial genome. The process consists of the following segments : 1) Downloading of sequences from public databases on a species specific basis 2) Clustering on the basis of sequence similarity using our clustering software (CLOBB) into groups of sequences which putatively derive from the same gene family 3) Assembly of the clusters into contigs using the public domain software phrap 4) Simple annotation on the basis of BLAST to local in-house databases 5) Creation of HTML summary tables and 6) Creation of a local SQL database allowing for the formulation of more complex queries to facilitate data mining.
Do I need other software to make this EST software work?
Yes. The sequence chromatographic trace base-calling program phred, and the vector sequence matching software cross_match. phred, cross_match and a third useful program phrap come as a package available under a free academic licence from the program's author, Phil Green at the University of Washington. Also, you may optionally require the sequence similarity search suite BLAST (this is available on BioLinux).
Who developed this EST pipeline software?
The software was first developed by Dr John Parkinson, David Guiliano and Mark Blaxter, at Edinburgh University. The software was then developed by Alasdair Anthony, Ralf Schmid and James Wasmuth
How do I download and install trace2dbest?
You can download the software and manuals from here:
software
trace2dbest is packaged as a tarball and is relatively easy to install on Linux/Unix systems (or on a PC using cygwin). Simply unpack the tarball using 'tar zxvf trace2dbest_x.x.tar.gz' and then copy the perl scripts to wherever you keep executables. Please remember to read the user manual before starting trace2dbest.
How can I suggest improvements or bugs?
We welcome feedback. Any comments, suggestions, or bugs should be reported by emailing us
How can I cite this software?
The reference for trace2dbest (and PartiGene) is "John Parkinson , Alasdair Anthony , James Wasmuth , Ralf Schmid , Ann Hedley , and Mark Blaxter
PartiGene—constructing partial genomes
Bioinformatics Advance Access published on June 12, 2004, DOI 10.1093/bioinformatics/bth101.
Bioinformatics 20: 1398-1404".
I get the error "phredpar.dat not found" - why?
The location of the phredpar.dat file needs to be defined as an environmental variable, as decribed in the phred documnetation. For a bash shell add this line to your bashrc file: export PHRED_PARAMETER_FILE=/my/phredpar.dat/file/is/here for a C-like shell add this line to your cshrc file: setenv PHRED_PARAMETER_FILE "/my/phredpar.dat/file/is/here"
obviously you will need to alter the location to show where your phredpar.dat file really is.
I don't think trace2dbest is removing all vector sequence - why?
First, please ensure you have installed the latest version of trace2dbest.
If you are still having problems, you could use the option in trace2dbest to alter the cross_match vector screening parameters.
What Should I do if I get this error "fatal error: unable to match primer ID string" in trace2dbest?
This is a phred error message that occurs when phred does not recoginse the primer ID in the trace file. To solve the problem, simply add an appropriate primer ID line (containing chemistry, dye and machine information) to the phredpar.dat file. See the phredpar.dat file, the trace2dbest manual and the phred documentation for more information.
trace2dbest says it can't save output files because file already exists - how?
trace2dbest saves its output files to a directory time stamped with the current hour and minute. Therefore if more than one trace2dbest session finishes in the same minute they will try to write to the same directory. The best solution is to leave at least a minute between trace2dbest sessions.
|
...other interesting things...
The human-parasitic filarial nematode Wuchereria bancrofti.
Filarial nematodes are tissue and gut parasites of a wide range of vertebrates, including humans. This species causes human elephantiasis, affecting over 20 million people. See NEMBASE3 for analyses of ESTs from this parasite and many other nematodes.
|
|
|