Regulatory
pathway predicition
If the binding site of a transcriptonal regulatory protein or
complex is known with some specificity, it should be possible to scan
a genome sequence for incidences of the target sequence and thus
define genes that may be regulated by the protein/complex.
In C. elegans the binding sites of few DNA-binding
regulators have been defined to the required specificity. In Clarke
and Berg's contribution to the C. elegans genome issue of
Science, they performed an analysis of sites for TRA-1 and GATA
factors. Nearly 3% of the predicted genes in the genome have
zinc-binding motfs that may
indicate a DNA-binding activity. The network of regulation this
implies is complex in the extreme.
TRA-1
tra-1 is invoved in the regulation of the differentiation
of sex-specific tissues of the animal. TRA-1A binds DNA, and a hidden
Markov chain model of the consensus for its binding site was used to
search the C. elegans genome sequence to identify potential
targets. As a control, a random sequence of the same composition was
also analysed. There are 1300 potential sites for TRA-1A binding in
the genome, but, significantly, there are more genes with MULTIPLE
upstream TRA-1A binding sitesthan would be expected by chance. A
random distribution of sites would predict no genes with more than 2
sites. There are five genes with 2 or more upstream sites, and within
this select group are two genes, lin-31 and mab-3,
known to be affected by tra-1 or be involved in sex
determination. The other three genes are obvious candidates for new
downstream genes involved in sex determination.
GATA factor sites
elt-1 is a C. elegans GATA binding factor. There are
200,000 matches to the GATA factor binding site [(A/T)GATA(A/T)] in
the genome sequence. There are 17 genes with 7 or more GATA sites
upstream: only 2 genes would be expected from a random genome. Within
these 17 is elt-1 itself, suggesting that it may autoregulate.
Onward to full understanding of regulatory
networks
The definition of DNA binding sites of most regulatory proteins is
still too loose to perform similar studies. However, DNA chip based
cDNA arrays may allow the definition of genes whose mRNA levels are
affected by other genes. Whole genome expression arrays can be probed
with mRNA from wild type and mutant nematodes. Loss of function
mutants, and transgenically engineered overexpression strains, can be
used to define interacting genes.