
a C. elegans adult hermaphrodite
An introduction to the genome of the nematode
Caenorhabditis elegans
Mark Blaxter, Institute of Evolutionary Biology, University of Edinburgh, UK
The C. elegans genome is spread across six approximately
equally sized chromosomes (5 autosomes, one X). It has been
completely
sequenced. Follow this link for a more detailed overview of the C.
elegans genome. The community genome database WormBase has full
information...
The
genome size is 100.2 Megabases (Mb)
Humans have a genome size of 3,000 Mb.
The baker's yeast (S. cerevisiae) genome is 12 Mb.
The genomes of other nematodes are in the same range. Brugia
malayi, a filarial nematode parasite of humans, has a genome of ~95 Mb. However
Ascaris suum, the pig roundworm, has a larger germ line
genome (>500 Mb) which undergoes somatic diminution.
The AT content is 44%.
How was the genome sequenced?
First, a physical
map of the genome was constructed. The map is based mostly on
17,000 cosmid clones of genomic DNA (insert size 35-40 kb). These
clones were "fingerprinted" using restriction enzymes, and the
fingerprints used to order the clones in overlapping contiguous sets,
or contigs. These cosmid contigs were supplemented by a set of
3,000 yeast artificial chromosome (YAC) clones (insert sizes 100 kb and
above). Because the yeast host tolerates sequences which E. coli
will not, the YAC clones can "bridge" gaps between contigs of
cosmids. With these two resources, contigs covering >95% of all
the chromosomes were assembled. The genome sequence was generated using a clone-by-clone strategy using these cosmid and YAC clones, and supplemented by directed cloning of 'difficult' regions.
The genome is
completely sequenced.
It was essentially completed around
Christmas 1998. It was the first animal genome completed.
Sequencing was started on the cosmid clones, and moved from them
to the bridging YACs. From the genome sequence, protein-coding and RNA genes have
been identified and novel features of gene organisation and
chromosomal structure discovered.
Gene identification.
There are about 20,000 protein-coding genes. As an aid to gene identification, cDNA
copies of mRNAs are being "tag sequenced" to identify them. Over
200,000 cDNAs have been tag sequenced and >300,000 ESTs deposited. These "expressed sequence tags" or
ESTs offer a set of snapshots of gene expression in the nematode, and
have identified around half of the organism's genes. The cDNA data is
used in the prediction of genes from the genome sequence along with
database searches for similarities between C. elegans genes
and those of other organisms such as humans. This estimate is based
on the correspondance between genomic DNA sequence and cDNA
sequences, and on the prediction of coding genes from genomic
sequence.