|
Overall
patterns in the C. elegans
genome
1 Distribution of DNA in intergenic DNA, exons and
introns
|
intergenic DNA
|
47%
|
|
exonic DNA
|
27%
|
|
intronic DNA
|
26%
|
2 Distribution of
repetitive DNAs
2.7% of the genome is simple, short tandem repeat
3.6% is simple, short inverted repeat
|
repeat type
|
tandem
|
inverted
|
|
intergenic DNA
|
49%
|
55%
|
|
exonic DNA
|
<1% |
<1% |
|
intronic DNA
|
51%
|
45%
|
There are 38 defined dispersed repeat families: many of these
correspond to transposon-like elements. many transposons (Tc
elements) had already been defined in C. elegans as mutagenic
elements. Many of the dispersed repeat families appear to be relics
of transposon families no longer active, including four novel
families in the Tc1/mariner group.
Some individual repeats have strikingly partitioned locations in
the genome, the functional significance of which is unclear.
|
repeat family
|
features
|
|
CeRep26
|
telomeric repeat. Not found in introns
|
|
CeRep27
|
Not found in introns
|
|
CeRep11
|
712 copies, only one of which is on the X
|
There are many instances of dupication of segments of genomic DNA,
some including several expressed genes. Some of these duplications
have diverged in sequence enough to confirm that both copies are
expressed.
3 Distribution of genes
across chromosomes
The autosomes had been divided genetically into central clusters
(where recombination appeared to be suppressed) and arms (where
recombination rates were significantly greater). The X chromosome has
a uniform recmbination rate.
Analysis of gene density over these genetically defined intervals
reveals that the autosome arms
have more repeats (particularly
some families)
have fewer corresponding ESTs
have fewer genes that have a
significant match to non-nematode proteins
have more clusters of closely related genes
This suggests that these regions may be rapidly evolving, and may
be the birthplace of new genes and gene families.
4 Structure of the genes
The current protein-coding gene
dataset is 19,099 genes. There are over 900 RNA coding genes.
The average C. elegans gene is ~3 kb long and has 5
introns. Most C. elegans genes have introns. Many introns are
very small (37-70 bases). This class of small introns is distinct in
sequence content from the remaining larger introns (100 bases to 10
kb).
Most C. elegans genes are trans-spliced at their 5'
end to a small leader exon, called
SL-1. Estimates of the
actual proportion ranges from 80% to 60%.
About 25% of C. elegans genes are organised as operons,
groups of cotranscribed genes (up to five genes in one putative
operon have been observed). Polycistronic premRNAs are resolved into
individual mRNAs by a trans-splicing process that includes the
addition of a trans-spliced leader exon at the 5' end of the
downstream gene. The SL used for resolution of this downstream
transcript can be SL-1 or one of a family of SL-2-like SLs. About 10%
of C. elegans genes are expected to recieve SL2-like SLs.
The C. elegans EST dataset (at the time of publication
68,000 ESTs from 40,000 clones corresponding to 7432 different genes,
now over 100,000 ESTs and 8,500 genes) covers 15% of the predicted
coding DNA. 40% of predicted genes have EST matches. 92% of splice
predictions were exact, and 97% of introns predicted by
GENEFINDER overlapped with those
defined by the EST cDNA sequences. Many instances of alternative
splicing have been defined. There are very few groups of unmapped
cDNAs (<10) and thus the sequenced regions are unlikely to have
many coding genes.
|