BaNG - Blaxter Nematode and Neglected Genomics
  The C. elegans genome
     Introduction to the genome of a model nematode
       Mark Blaxter at the Institute of Evolutionary Biology, University of Edinburgh
How was the genome sequenced?
Annotating the genome
 


C. elegans Genome Numbers

The genome sequence has 100,200 accredited and verified base pairs.

The completed genome sequence is made up of

2527 cosmids

113 fosmids

257 YACs

44 long range PCR products


Cosmids

A physical map was constructed from ~17,000 40kb cosmid clones using a two-enzyme fingerprinting technique. The cosmids were assembled into >600 contigs. The remaining gaps appeared to be non random, in that the 6-fold coverage afforded by the cosmids suggested that a clone should have been recoverable for many of them. A directed search in cosmid libraries for bridgeing clones was unsuccessful, affirming the perception that these regions were perhaps "unclonable" in cosmids.

Fosmids

A fosmid library was constructed and searched: this yielded 113 clones that bridged or extended contigs. Fosmids have a lower copy number per E. coli cell, and thus can carry DNA libale to rearrangement more readily.

YACs

The physical map was completed using YAC clones (100 kb to 3 Mb). These essentially covered the whole genome, and 3000 were mapped by hybridisation to selected cosmids. At the initiation of the sequencing project there were only about twenty contigs (representing the six chromosomes with ~14 nonrandom physical gaps).

Long range PCR

For some sequence gaps, long range PCR was used to isolate the bridging fragment. The telomeres, independently cloned, were allocated to chromosome ends by long range PCR or sequence identity.


Minimum tiling path sequencing: clone-by-clone

A minimum tiling path of overlapping cosmids was chosen, with about 25% overlap between each clone and its neighbours. The shotgun phase (on clones with 2-4 kb inserts) involved about 900 reads per 40 kb, and the finishing phase usually required another 100-200 reads. 20% of the genome sequence is from YACs.

The sequence colinearity with the genome was checked by

  • restriction enzyme digestion and mapping of cosmids
  • PCR accreditation of contiguated sequence (but PCR failure was much more common than real discrepancy)
  • comparison of overlap between clones
  • "skimming" overlapping YACs (acquiring single pass reads to affirm sequence contigs)

Gaps

At publication there were 3 gaps, one large but less than 450 kb, and two smaller ones. Four segments were present in YACs but had not yet been fully sequenced. 139 other small segments were unfinished. Some regions of tandem repeat were sized but not sequenced (as the repeat length is longer than the average sequence read length). Some gaps required the construction of microinsert libraries (<500 bases/clone) from the bridgeing clones, as longer inserts were unclonable.

All the gaps have now been sequenced

Errors

The sequence error, estimated from overlap between cosmids and from resequencing of clones in the two sequencing centres, is <1 in 10,000 bases (and is likely to be nearer 1 in 100,000). Two potential errors in the sequence due to cloning artefacts were identified due to overlaps between clones.

 

These pages were written by Mark Blaxter and last updated in early 2007.
Contact the www.nematodes.org webmaster if there are problems.