Woods
Hole
WHO TDR/NCBI Workshop
on Genome Computing for The Parasite Genome
Initiatives
13-17 September 1995
Rapporteur Mark Blaxter (Edinburgh)
All of the WHO-sponsored genome initiatives were represented:
Brugia malayi
- Mark Blaxter (Edinburgh, UK)
- David Guiliano (Northampton, USA)
Schistosoma mansoni
- David Johnston (London, UK)
- Gloria Franco (UFMG, Brasil)
Trypanosoma cruzi
- Wim DeGrave (Fiocruz, Brasil)
- Andrez Ruiz (FundCam, Argentina)
Trypanosoma brucei
- Sara Melville (Cambridge, UK)
- Najib ElSajid (JHU & U. Iowa, USA)
- Howard Cobb (Cambridge, UK)
Leishmania major
- Angela Cruz (USP, Brasil)
- Al Ivens (London, UK)
- Martin Azlett (Cambridge, UK)
Leishmania donovani
- Peter Myler (Seattle, USA)
Invited experts and discussants were
Sam Cartinhour (NAL, USA; acedb)
David Landsman (NCBI, USA; Entrez, databases)
Jim Ostell (NCBI, USA; Entrez, databases)
Prakash Nadkarni (Yale, USA; 4D/SQL)
Michael Gottleib (NIH NIAID, USA; NIH funding and policy)
Bob Hata (WHO, Geneva; TDR)
Boris Dobrokhotov (WHO, Geneva; TDR)
The meeting was held over 3
days at the Marine Biology Laboratory, Woods Hole, MA, USA. For each
genome project the researchers concerned presented progress to date
in mapping, sequencing and coordinating research on their genome. A
remarkable amount of work had been done, with WHO funding
supplemented with locally raised grants. Over 3000 parasite expressed
sequence tags had been generated from the WHO projects alone, and
another 2000 obtained by associated groups. Despite the relative
youth of the dataset, parasite est clones were already frequently
requested by researchers. Large insert clone libraries were available
on grids from all the protozoan parasites and Schistosoma. Contig
maps were being built of selected Leishmania and Trypanosome
chromosomes and had already revealed hitherto unexpected features of
genome organisation. Schistosome clones were being mapped to
chromosomes by fluorescent in-situ hybridisation. Data and reagents
are being made available to the respective research comunities on
request.
Representatives from the NCBI presented the latest information on
global gene databasing and previewed many upcoming enhancements of
the NCBI database services. There was enthusiastic and fruitful
discussion of the needs of the parasite genome community, resulting
in a series of recommendations to NCBI concerning data access and
retrieval. In particular, the ability to search databases by the
accession date was requested and the use of hypertext links from
parasite genome databases and world wide web sites to NCBI databases
was encouraged.
For collation and handling of genome data, several systems were
discussed. It was decided unanimously to recommend that:
- each genome laboratory would continue to handle and process
its genome data with systems already in place. Thus each mapping
laboratory would continue to use software developed for the local
computing environment and with the skills and support available
locally.
- the public sequence databases (NCBI Genbank and dbest) would
be used as sequence repositories for expressed sequence tag and
genome sequence generated by the project to ensure thorough
external quality control (at NCBI) and consistency in
presentation. Submission of sequence data to the NCBI would be the
responsibility of the laboratory generating it.
- each genome would develop a unitary genome database for
archiving and presentation of data. This database would be able to
be installed locally in a number of computing environments (UNIX,
DOS, Windows and MacOS), and would have a proven ability to be
made internet-available. A single researcher at the core of the
project would be the database curator and project secretary.
- all the genome projects agreed to use the ACeDB engine for
database development. This was because (1) it meets all the data
handling and presentation needs identified in this meeting (2) it
is installable under UNIX, LINUX on PCs and MacOS (a Windows
version is under development) (3) world wide web and gopher access
accross the internet is a proven technology (4) it is free (5) the
group has extensive experience of it and (6) a large global
network of user support is available.
- five ACeDB databases would be constructed:
- LeishDB Already under development by Al Ivens,
Martin Azlett and Howard Cobb, Cambridge
- TrypDB Already under development by Sara Melville,
Martin Azlett and Howard Cobb, Cambridge
- Tcruzi To be developed by Wim Degrave, Brasil, in
collaboration with Cambridge
- FilDB Already under development by Mark Blaxter,
Edinburgh
- Schistosoma To be developed by David Johnston,
London, in collaboration with Mark Blaxter, Edinburgh.
- the researchers underlined above should be the database
curators, and also act as genome initiative "secretaries" for
their organism.
- each genome project would continue to liase with NCBI in order
to best expedite the flow of genome data into the NCBI database
systems from the unified ACeDB databases.
The unified view of the researchers present was that an extremely
cost effective approach to meeting the computing needs of the
initiatives would be to employ a single computer specialist to assist
with all the acedb database construction. It was noted that the
filarial initiative (Mark Blaxter) had requested funds from the WHO
for a database management person. The meeting enthusiastically
endorsed a proposal that
- a WHO parasite genomes computing person should be employed
- she/he should have the remit of
- developing and installing acedb engines for each database
in consultation with the current curators
- developing and implementing data entry protocols and
scripts for each database in consultation with the current
curators and the data generating laboratories
- developing and installing www and gopher internet access
for the databases
- assisting with local installation of the acedb databases in
the participating labs worldwide
- she/he should be based in Cambridge but should travel to the
other laboratories involved as needed.
Mark Blaxter was aked to prepare a revised version of his proposal
to present to the WHO steering committee.