THE FILARIAL GENOME PROJECT

 

Mark Blaxter1*, Jennifer Daub1, Martin Waterfall1, David Guiliano1,2, Steven Williams2, Kunthala Jayaraman3, Reda Ramzy4, Barton Slatko5 and Alan Scott6

 

1 Institute of Cell, Animal and Population Biology, University of Edinburgh, King's Buildings, Edinburgh EH9 3JT UK

2 Clark Science Center, Smith College , Northampton, MA 01063 USA

3 Center for Biotechnology, Anna University, Madras 600025 INDIA

4 Research & Training Center on Vectors of Diseases, Ain Shams University, Abbassia, Cairo 11566 EGYPT

5 New England Biolabs, 32 Tozer Road, Beverly, MA 01915 USA

6 Dept. Molecular Microbiology and Immunology, Johns Hopkins School of Hygiene, 615 North Wolfe St., Baltimore MD 21205 USA


Brugia malayi is a filarial nematode which infects humans. Over 120 million people carry infections of lymphatic filarial nematodes, and have pathologies ranging from asymptomatic carrier to chronic elephantiasis (WHO, 1993, Ottesen and Ramachandran, 1995) . While most lymphatic filarial infections are caused by the nematode Wuchereria bancrofti, this parasite is impossible to culture in standard lab conditions and is thus not amenable to extensive study. B. malayi, which is closely related to W. bancrofti (Xie, et al., 1994) , can however be grown in laboratory animals and this species has thus been chosen as the model for study (Unnasch, 1994) . The WHO and MRC (UK)-sponsored Filarial Genome Project (FGP) (Blaxter, 1995) has the following aims:

1 to produce and provide genome research resources for the filarial community

2 to undertake a program of gene discovery to aid vaccine-development and drug-target identification strategies

3 to train and support endemic country researchers.

The participants in the FGP are listed in Table 1.

THE FILARIAL GENOME

The genome of Brugia malayi is estimated to be about 100 million base pairs (100 Mb) (Maina, et al., 1987) . The repetitive DNA content is 17%, and one single satellite repeat, the Hha I repeat, is present in ~30,000 copies (and thus makes up nearly 10% of the genome) (McReynolds, et al., 1986) . Other identified repeat DNAs (ribosomal RNAs in the main) make up another 2-3% of the genome. The genome is AT rich (79%) (Rothstein, et al., 1988) , and this extreme bias is exacerbated in non-coding (intergenic and intronic) regions. The genome is organised as five chromosomes which cannot be separated on pulsed field gels (Sakaguchi, et al., 1983) . The Hha I repeat is organised as 8-10 sets of extensive tandem repeats, and it is tempting to suggest that these define regions on each chromosome. The number of protein-coding genes in B. malayi is not known, but the rhabditid model nematode Caenorhabditis elegans with a genome of 100 Mb has ~15,000 (Hodgkin, et al., 1995) , and we expect B. malayi to be similar.

Up to 1994, very few genes had been cloned from filarial nematodes, and only about 60 from B. malayi. These genes were mainly copies of the Hha I repeat and a select set of antigen-encoding genes. Many genes had been identified multiple times because of redundancy in (primarily antibody-based) screening methods. The first major goal of the FGP was therefore to boost this number so as to allow researchers access to a wide range of cloned genes for further study.

NEW, HIGH-QUALITY CDNA LIBRARIES

Seven new, high-quality cDNA libraries have been constructed for the project by the Williams and Scott laboratories (see Figure 1) (Yenbutr and Scott, 1995, Blaxter, et al., 1996) . The intention has been to provide access points to the whole of the B. malayi life cycle. This approach allows us to sample from multiple timepoints in the life cycle, and to concentrate on those genes expressed at times thought to be important, such as the mosquito-vertebrate transition, or in the adult. The libraries have been constructed directionally in the lambda vector Zap II which allows for excision of the insert as a pBluescript-family plasmid, a feature which significantly facilitates downstream analysis. Two kinds of library have been made. Where material is not a significant limitation, conventional techniques (RNase H and DNA polymerase) are used to generate near-full length transcripts. When material is limiting, we have used a PCR technique to generate guaranteed full length transcripts. Most nematode messenger RNAs have at their 5' end a trans-spliced exon (the spliced leader or SL) (Blaxter and Liu, 1996) . This is used in conjunction with oligo(dT) (dT) to make large amounts of clonable product from small quantities of starting material (Yenbutr and Scott, 1995, Martin, et al., 1995) . The libraries are checked for insert size and redundancy and then are made available through the FGP to anyone interested (see Table 3).

EST SEQUENCING

The goal of gene identification has been achieved through expressed sequence tag or EST sequencing (McCombie, et al., 1992, Waterston, et al., 1992) . Clones are selected at random and sequenced once from the 5' end. These tag sequences are then deposited in the public databases where they are available for all to search (see Table 3) (Blaxter, et al., 1996) . The sequencing strategy is outlined in Figure 2. For each library, around 500 EST sequences are generated from plaques picked from the library. If the redundancy of these sequences is low (ie there are few repeat sequencings of cDNAs deriving from the same gene) then additional (up to 3000) reactions are performed on unselected plaque-picked clones. If the library is more redundant, then a process of mass excision of the library and screening of gridded plasmid clones is carried out. Only those clones not hybridising to probes made from abundant sequences are further processed for sequencing.

The sequences are then analysed further using publicly available resources (see Table 3). The B. malayi ESTs are compared with each other to define clusters (cDNAs deriving from the same gene) and families (cDNAs deriving from related genes) using the search programs BLASTn (nucleotide search sequence vs. nucleotide database) and tBLASTx (translated nucleotide search sequence vs. translated nucleotide database) (Altschul, et al., 1990) . A server offering these searches on the B. malayi (and other parasite) dataset has been set up (M. Aslett and M. Blaxter, unpublished; see Table 3).

The ESTs are also compared to sequences from other organisms to define putative functions. The nematode C. elegans is the closest comparator for most filarial genes, and thus special attention is paid to similarities between B. malayi EST and the genome and EST sequences from the C. elegans genome project. In particular, tBLASTx and BLASTx (translated nucleotide search sequence vs. protein database) searches are performed. The general databases are also searched (BLASTx).

The clustering and similarity information is then used to annotate the B. malayi "genes", and thus build up a database for viewing and integrating the genome information. For this the C. elegans model is again being followed, and ACeDB, the integrated genome database developed for that project, has been adapted for FGP use (Sulston, et al., 1992) . The filarial database "FilDB" currently contains in excess of 80,000 "objects" including over 8,000 sequences and 20,000 bibliographic references. The references were derived from a bibliography project funded by the Edna McConnell Clark Foundation (C. Booth and M. Blaxter, unpublished).

CLUSTER ANALYSIS REVEALS THE SUCCESS OF THE MULTIPLE LIBRARIES APPROACH

Using BLASTn similarity information, the ESTs have been grouped into clusters which we believe define single genes. These clusters have from 1 to over forty members, and allow the accuracy of individual EST sequence reads to be checked. In the case of overlapping clones, the clustering permits the derivation of a consensus sequence for the cluster which can cover the complete cDNA sequence. The clustering algorithms are being further developed for application to other parasite EST datasets.

The 7,800 ESTs deposited in the public databases thus appear to correspond to 3,500 different transcripts, an overall redundancy of 2.23 clones per cluster (Table 2). Different libraries range in redundancy from 1.3 to 2.0, with the SL-dT PCR libraries being more redundant (despite rounds of cross screening to eliminate highly represented sequences). The success of the multiple library and cross screening approach can be compared to that of the C. elegans EST project, based on 2 libraries, where over 37,000 sequences define only 4,500 genes (Y. Kohara, personal communication).

EST SEQUENCING FROM OTHER FILARIAL NEMATODES

The FGP has also embarked on smaller EST projects from other human infective filaria (Onchocerca volvulus, Wuchereria bancrofti, and Loa loa) where nematode material for making cDNA libraries or the libraries have become available. These sequences have also been deposited in dbEST and form an important resource for cross-species research.

THE FILARIAL GENOME PROJECT RESOURCE CENTER AND THE FILARIAL GENOME NETWORK

The FGP is committed to making the data and resources available to all. This is done through rapid deposition of sequences in the databases, provision of search tools and continued ongoing analysis of the data in the project labs. The results of these analyses (such as the cluster analysis) are made available through the FGP world wide web site and email network (see Table 3). All the libraries and the sequenced clones are available through the WHO funded Filarial Genome Resource Center in Northampton, MA, USA (see Table 3).

UTILITY OF THE EST DATA

The ESTs include all the previously cloned genes. The remaining 3,450 genes are a gold mine of resources for discovery of new molecules to develop drugs against or test as vaccines. The clustering process can identify genes which are expressed specifically or upregulated significantly in one stage. It can also identify targets which if attacked would compromise all stages. The similarity information derived from database searching can be used in a rational search for new targets. Importantly, the project has also identified many genes (about 40% of the total) for which there are no homologues and thus no identifications. These "novel" genes may turn out to be the most interesting. For example, of the 40 most highly expressed fourth stage larval genes, 15 have no homologues. Why are fourth stage larvae expressing these genes at such high levels? What do they do? Are these functions nematode specific and essential? Any identified clones can be made available for further research within a week. This rapid turnaround as well as the global nature of the analysis (all genes from all stages) promises to serve filarial research well for years to come.

ACKNOWLEDGEMENTS

We would like to thank the Parasite Genome committee of the WHO and in particular Boris Dobrokhotov for their support, and the UK MRC, the McConnell Clark Foundation and the WHO for financial support. Many colleagues involved in the genome project and from the filarial research community have made significant contributions to the project and we would like to thank them for this. Mark Blaxter is funded by the Darwin Trust.

REFERENCES

WHO (1993). Lymphatic filariasis and onchocerciasis. In (Eds.) "Tropical Disease Research, 11th Programme Report. Progress 1991-1992". WHO-TDR

E. A. Ottesen and C. D. Ramachandran (1995). Lymphatic Filariasis. Infection and disease: Control strategies. Parasitology Today 11, 129-131.

H. Xie, O. Bain and S. A. Williams (1994). Molecular phylogenetic studies on Brugia filariae using Hha-1 repeat sequences. Parasite 1, 255-260.

T. R. Unnasch (1994). The Filarial genome Project. Parasitology Today 10, 415-417.

M. L. Blaxter (1995). The Filarial Genome Project. Parasitology Today 11, 811-812.

C. V. Maina, A. G. Grandea III, L. T. K. Tuyen, N. Asikin, S. A. Williams and L. A. McReynolds (1987). Dirofilaria immitis: Genomic complexity and characterisation of a structural gene. In A. J. MacInnis (Ed.) "Molecular Paradigms for Eradicating Helminthic Parasites". Alan R. Liss Inc. pp. 193-204.

L. A. McReynolds, S. M. DeSimone and S. A. Williams (1986). Cloning and comparison of repeated DNA sequences from the human filarial parasite Brugia malayi and the animal parasite Brugia pahangi. Proc. Natl. Acad. Sci. USA 83, 797-801.

N. Rothstein, T. J. Stoller and T. V. Rajan (1988). DNA base composition of filarial nematodes. Parasitology 97, 75-79.

Y. Sakaguchi, I. Tada, L. R. Ash and Y. Aoki (1983). Karyotypes of Brugia pahangi and Brugia malayi (Nematoda: Filaroidea). J. Parasitol. 69, 1090-1093.

J. Hodgkin, R. H. A. Plasterk and R. H. Waterston (1995). The nematode Caenorhabditis elegans and its genome. Science 270, 410-414.

P. Yenbutr and A. L. Scott (1995). Molecular cloning of a serine proteinase inhibitor from Brugia malayi. Infect. Immun. 63, 1745-1753.

M. L. Blaxter, N. Raghavan, I. Ghosh, D. Guiliano, W. Lu, S. A. Williams, B. Slatko and A. L. Scott (1996). Genes expressed in Brugia malayi infective third stage larvae. Mol. Biochem. Parasitol. 77, 77-96.

M. L. Blaxter and L. X. Liu (1996). Nematode Spliced Leaders: Function, Evolution and Utility. Int. J. Parasitol. 26, 1025-1033.

S. A. M. Martin, F. J. Thompson and E. Devaney (1995). The construction of spliced leader cDNA libraries from the filarial nematode Brugia pahangi. Mol. Biochem. Parasitol. 70, 241-245.

W. R. McCombie, M. D. Adams, J. M. Kelley, M. G. FitzGerald, T. R. Utterback, M. Khan, M. Dubnick, A. R. Kerlavage, J. C. Venter and C. Fields (1992). Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nature Genet. 1, 124-131.

R. Waterston, C. Martin, M. Craxton, C. Huynh, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Shownkeen, N. Halloran, M. Metzstein, T. Hawkins, R. Wilson, M. Berks, Z. Du, K. Thomas, J. Thierry-Mieg and J. Sulston (1992). A survey of expressed genes in Caenorhabditis elegans. Nature Genet. 1, 114-123.

S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

J. Sulston, Z. Du, K. Thomas, R. Wilson, L. Hillier, R. Staden, N. Halloran, P. Green, J. Thierry-Mieg, L. Qiu, S. Dear, A. Coulson, M. Craxton, R. Durbin, M. Berks, M. Metzstein, T. Hawkins, R. Ainscough and R. Waterston (1992). The C. elegans genome sequencing project: A beginning. Nature 356, 37-41.


TABLE 1 PARTICIPANTS IN THE FILARIAL GENOME PROJECT

TABLE 2: CLUSTER ANALYSIS OF BRUGIA MALAYI EST SEQUENCES

TABLE 3: ACCESS TO FILARIAL GENOME PROJECT DATA AND RESOURCES


FIGURE 1: THE BRUGIA MALAYI LIFE CYCLE AND FILARIAL GENOME PROJECT CDNA LIBRARIES

The Brugia malayi life cycle takes over 120 days and is completed in two hosts. The first stage larva (called a microfilaria or MF) is taken up by a mosquito. It develops to the mammal-infective L3 within the mosquito and is then injected when the intermediate host takes a second blood meal. In the mammal, the parasite migrates to the lymphatics and moults twice to become diecious adults. The females release sheathed MF into the bloodstream. The FGP has generated high quality cDNA libraries from MF, L2, L3, L3-L4, L4, adult male and adult female stages. The names of these libraries are given below the descriptions of the stages from which they were made. The conventional libraries have a "C" designation, the SL-dT ones an "SL" designation.

FIGURE 2: THE FILARIAL GENOME PROJECT EST SEQUENCING STRATEGY

FIGURE 3: THE FILARIAL GENOME PROJECT EST SEQUENCE ANALYSIS STRATEGY