
Filarial Genome Meeting 14-18 March, 1998
OPENING SESSION
The opening session included welcomes by Professor Reda Ramzy (Ain Shams University), Dr. Sawar (Head of The Research and Training Center on Vectors of Diseases), Dr. Tag E. Dim (Vice President of Ain Shams University Postgraduate Studies and Professor of Schistosomiasis Studies), Dr. Steve Williams (Smith College and Filarial Genome Project Coordinator) and Dr. Boris Dobrokhtov (Manager, Parasite Genome Committee, TDR, WHO).
[ The Genome Initiatives | The Brugia Genome Project | Mining the EST Data ]
Boris Dobrokhotov described the WHO-sponsored initiatives in several genomic areas and indicted that, for the filarial genome project, future goals should include:
(1) continued research and development including BAC, cosmid, YAC libraries, novel cDNA libraries, etc.
(2) gene discovery
(3) genomic mapping
(4) database development
(5) advanced gene anaysis and "post-genomics"
(6) research and training for endemic labs
Steve Williams reviewed the results of the Filarial Genome Project as of March, 1998, after thanking Mark Blaxter and Reda Ramzy for putting the program together. He mentioned that 130 million people are affected with lymphatic filariasis; 106 million with Wuchereria bancrofti and 0.25 million affected with filiariasis in Egypt. These diseases are present in 73 countries and there are 1.1 billion people at risk. These diseases are the 2nd leading cause of economic disability in the world, affecting subtropical, tropical and non tropical countries. He reviewed the research groups currently involved in the project and welcomed Tania Supali and her lab at the University of Jakarta in Indonesia as our newest member and partner of the Slatko lab. The laboratories now involved are those of Williams, Slatko, Blaxter, Scott, Jayaraman, Supali and Ramzy. Steve also reviewed the list of other genome projects being funded by WHO, and the WHO's consideration of supplementing existing malaria research in this area.
With respect to the filariasis project, it was of interest to recall that the project was initiated in January 1995 funded by WHO and now additionally funded from the MRC (Blaxter lab) and the McConnell Clark Foundation for an Onchocerca volvulus EST project [Williams lab]. Brugia malayi was chosen at the first meeting due to availability of all life cycle stages; the feeling was that all the filarial nematodes would be closely related and therefore Brugia provided the best model organism of choice. Steve reminded us that before the genome project was initiated, the following molecular/cytogenetic information was known about Brugia:
haploid number of 5, XY sex determination system (XY male)
genome size is 100 Mb (equivalent to C. elegans)
75% A +T (C. elegans is 60% A +T)
Hha I repeat (322 bp) repeated 30,000 times, comprising 9.7 Mb or 10% of the genome.
only 50 genes known in the database out of 16,000 possible genes.
The original goals of the project included gene discovery, physical mapping, dissemination of genome data and training for the endemic labs. The DNA sequencing is primarily that of ESTs (expressed sequence tags) which are short nucleotide sequences generated from cDNA clones and which are copies of RNA expressed in cells. These sequences are generally not full length and are error-prone "one pass" sequences which are useful to possible functional information to compare to previously cloned gene sequences or the cDNAs from all other organisms. This also provides a resource for those cloning genes for other approaches, such as by immunoscreening. "On-line" searches can aid in the identification of the functions of newly discovered genes.
Currently, there are 7 cDNA libraries which are being analyzed and a recent L3-L4 transition library and other libraries are being made and are under consideration. There are over 13,000 ESTs in our Brugia database. However, they are not all "different" genes: there are about 6000 unique new genes. C. elegans is now estimated to have about 16,000 genes, so that we are about 35% complete with the gene discovery program.
About half the ESTs have a match ("hits") in existing databases; half are novel "non-hits". With cDNAs from each stage of development, we are beginning to see genes which are differentially expressed or greatly "up-regulated". Some genes are expressed in L3 that are not expressed elsewhere; this appears to be the case for all stages, as well as for the transition stages (L3 -L4, for example). All sequences, as well as libraries, clones, etc., are fully accessible and the sequences are available on the WWW, as well as being published in journals. Web access to the project is via FilGenNet, or by NCBI searches of the dbEST, nucleic acid or protein databases.
We hope to have a primary genome map by 1999 and will soon start the mapping of BAC libraries using ESTs and end sequences as probes. In addition to the above numbers, the project has been a successful cooperative model of worldwide sharing of data. High quality libraries of all stages of development have been generated, new gene discovery has occurred, many identified clones have been distributed to researchers. Cosmid, BAC and YAC libraries have been constructed and training workshops including sequencing, mapping and bioinformatics have been held.
The current goals include further development of the FilDB database, more libraries being constructed (including subtraction libraries), physical mapping, more ESTs being sequenced and continued training, as required. We are able to analyze genes of interest ("cool genes") including those which are potential vaccine candidates, such as: thioredoxin peroxidase, male specific genes, proteases and protease inhibitors, etc.
Future goals include the increased cooperation between labs involved in filariasis, further training of endemic labs, sequencing of 30,000 ESTs and identification of 10,000 to 12,000 "unique" genes, mapping of 5000 genes, construction of new libraries, increasing computer resources where needed, further identification and study of potential vaccine candidates and targets, all eventually leading to elimination of the disease.
Mark Blaxter then spoke about the bioinformatics strand of the project. He described the concept and uses of "clusters" and the upcoming FilDB database release. He spoke about his work in Edinburgh to increase the speed and efficiency of "clone to sequencing gel", which is quick and wherein 90% of the clones eventually end up in the database. The process of taking the 13,600 ESTs and placing them into uniquely identified groups, or clusters, was described. He spoke about cluster approaches wherein the ESTs are used to make similarity searching matches among the ESTs to make "clusters". However some editing is required as some cDNAs may not be correct or be chimaeric, etc.
Clusters allow one to look at the sequence identity more accurately. Overlapping ESTs can generate consensus sequences which can be used to correct mistakes and to extend sequences more accurately. One can then use the consensus sequence to search databases to look at protein identification, gene families, generate search strings, search databases, etc. One can compare sequences to databases by sequence patterns (motifs or domains) and can ask about the function of the gene and tie it to the biology and evolution of the parasites. After clustering, only about half of the resultant 6,000 clusters can be identified based upon database "hits" to previously discovered genes. This information will be fed back to the FilDB database which will allow this type of integrated analysis.
Proteomics is the next wave of analysis. It is now possible to obtain sequence from small amounts of protein. With the cluster data, it is possible to look at stage specific or tissue specific or subcellular location specific (membrane bound proteins. For example, secreted proteins from adults released by detergent treatment may, on 2 D gels, produce about 50 protein spots. In Toxocara canis, N-terminal peptide sequencing is being done by Rick Maizels' lab in Edinburgh; many secreted proteins are blocked at the N-terminus and have to be digested with trypsin. Mass spectroscopy can identify molecular weights of the peptides and from that, the composition of the peptide can be derived using computer programs (such as MOWSE) which have databases of tryptic digest data. In combination with the EST data, proteomics thus allows a further glimpse into the biochemistry of Brugia.
With FilDB as a resource, one can link function and gene family location. One can identify a particular structure in the database, or a particular gene family, or genes active on the surface of the gut, etc. One can do database searches using motifs (transmembrane segments, membrane receptors, ion channels, signal peptides, etc.) using "fuzzy" search parameters. One can go after abundantly expressed genes, secreted proteins or target other classes of genes. The ESTs themselves can be targets: one can look at clusters, consensus, functional identification with other databases, stage specificity, alternate splicing, signal peptides, functional targets, proteomics or simply functional identification or numerology.
Mark then spoke about interlibrary comparisons using the clustering data. For example, the cluster data reveals that in the L3 ESTs, 70 clusters are both highly expressed and stage specific. He mentioned some problems such as ribosomal RNA, bacteria-like sequences (perhaps the endosymbiotic rickettsia Wolbachia), chimaeric clusters, etc.
The cluster set data and updates will be in FilDB so that one can ask questions such as: Can I see all clusters in MF and nowhere else, or see abundant ones everywhere; etc. Cluster data indicates that there are about 80 stage specific abundant ESTs. FilDB is the database which will integrate data as much as possible on sequences, biology, an other relevant data, like the C. elegans ACeDB. Currently FilDB has several sections of interest including information on the genome network, pathology and treatment:, a search engine for parasite-specific sequence searching the datasets (Genbank and EST database have lots of human data; searches can be parasite or Brugia specific, filarial, all nematode, etc.). This version of FilDB should be up and functional in the next few weeks and it will be accessible on the WWW. A simpler version of the cluster dataset will be available as a standalone database in Filemaker Pro format.