C.
elegans Genome
Numbers
The genome
sequence has 100,200 accredited and verified base pairs.
The completed genome sequence is made up
of
2527 cosmids
113 fosmids
257 YACs
44 long range PCR products
Cosmids
A physical map was constructed from
~17,000 40kb cosmid clones using a two-enzyme fingerprinting
technique. The cosmids were assembled into >600 contigs. The
remaining gaps appeared to be non random, in that the 6-fold coverage
afforded by the cosmids suggested that a clone should have been
recoverable for many of them. A directed search in cosmid libraries
for bridgeing clones was unsuccessful, affirming the perception that
these regions were perhaps "unclonable" in cosmids.
Fosmids
A fosmid library was constructed and searched: this yielded 113
clones that bridged or extended contigs. Fosmids have a lower copy
number per E. coli cell, and thus can carry DNA libale to
rearrangement more readily.
YACs
The physical map was completed using
YAC clones (100 kb to 3 Mb). These essentially covered the whole
genome, and 3000 were mapped by hybridisation to selected cosmids. At
the initiation of the sequencing project there were only about twenty
contigs (representing the six chromosomes with ~14 nonrandom physical
gaps).
Long range PCR
For some sequence gaps, long range PCR was used to isolate the
bridging fragment. The telomeres, independently cloned, were
allocated to chromosome ends by long range PCR or sequence identity.
Minimum tiling path sequencing:
clone-by-clone
A minimum tiling path of overlapping cosmids was chosen, with
about 25% overlap between each clone and its neighbours. The shotgun
phase (on clones with 2-4 kb inserts) involved about 900 reads per 40
kb, and the finishing phase usually required another 100-200 reads.
20% of the genome sequence is from YACs.
The sequence colinearity with the genome was checked by
- restriction enzyme digestion and mapping of cosmids
- PCR accreditation of contiguated sequence (but PCR failure was
much more common than real discrepancy)
- comparison of overlap between clones
- "skimming" overlapping YACs (acquiring single pass reads to
affirm sequence contigs)
Gaps
At publication there were 3 gaps, one large but less than 450 kb,
and two smaller ones. Four segments were present in YACs but had not
yet been fully sequenced. 139 other small segments were unfinished.
Some regions of tandem repeat were sized but not sequenced (as the
repeat length is longer than the average sequence read length). Some
gaps required the construction of microinsert libraries (<500
bases/clone) from the bridgeing clones, as longer inserts were
unclonable.
All the gaps have now been sequenced
Errors
The sequence error, estimated from overlap between cosmids and
from resequencing of clones in the two sequencing centres, is <1
in 10,000 bases (and is likely to be nearer 1 in 100,000). Two
potential errors in the sequence due to cloning artefacts were
identified due to overlaps between clones.