BaNG - Blaxter Nematode and Neglected Genomics
  The C. elegans genome
     Introduction to the genome of a model nematode
       Mark Blaxter at the Institute of Evolutionary Biology, University of Edinburgh
How was the genome sequenced?
Annotating the genome
 

Regulatory pathway predicition

If the binding site of a transcriptonal regulatory protein or complex is known with some specificity, it should be possible to scan a genome sequence for incidences of the target sequence and thus define genes that may be regulated by the protein/complex.

In C. elegans the binding sites of few DNA-binding regulators have been defined to the required specificity. In Clarke and Berg's contribution to the C. elegans genome issue of Science, they performed an analysis of sites for TRA-1 and GATA factors. Nearly 3% of the predicted genes in the genome have zinc-binding motfs that may indicate a DNA-binding activity. The network of regulation this implies is complex in the extreme.

TRA-1

tra-1 is invoved in the regulation of the differentiation of sex-specific tissues of the animal. TRA-1A binds DNA, and a hidden Markov chain model of the consensus for its binding site was used to search the C. elegans genome sequence to identify potential targets. As a control, a random sequence of the same composition was also analysed. There are 1300 potential sites for TRA-1A binding in the genome, but, significantly, there are more genes with MULTIPLE upstream TRA-1A binding sitesthan would be expected by chance. A random distribution of sites would predict no genes with more than 2 sites. There are five genes with 2 or more upstream sites, and within this select group are two genes, lin-31 and mab-3, known to be affected by tra-1 or be involved in sex determination. The other three genes are obvious candidates for new downstream genes involved in sex determination.

GATA factor sites

elt-1 is a C. elegans GATA binding factor. There are 200,000 matches to the GATA factor binding site [(A/T)GATA(A/T)] in the genome sequence. There are 17 genes with 7 or more GATA sites upstream: only 2 genes would be expected from a random genome. Within these 17 is elt-1 itself, suggesting that it may autoregulate.

Onward to full understanding of regulatory networks

The definition of DNA binding sites of most regulatory proteins is still too loose to perform similar studies. However, DNA chip based cDNA arrays may allow the definition of genes whose mRNA levels are affected by other genes. Whole genome expression arrays can be probed with mRNA from wild type and mutant nematodes. Loss of function mutants, and transgenically engineered overexpression strains, can be used to define interacting genes.

 

These pages were written by Mark Blaxter and last updated in early 2007.
Contact the www.nematodes.org webmaster if there are problems.