TechSession8:
Genome Databases
Mark
Blaxter
In this tech session, you
will use the genome databases to perform a comparative analysis of a
conserved cluster of genes in three genomes: mouse, Drosophila
(fly), and Caenorhabditis (worm).
fly mouse nematode
The genes you will be examining are the
HOX genes.
See http://pharyngula.org/index/weblog/comments/a_brief_overview_of_hox_genes/
for a brief introduction to HOX genes, focussing on their role in Drosophila.
These genes encode proteins that contain
a homeobox, a DNA-binding domain found in many transcription factors.
See http://www.biosci.ki.se/groups/tbu/homeo.html
for a general introduction to homeobox genes. The HOX genes are a
special class of metazoan homeodomain proteins that were first
identified genetically in Drosophila by mutations that caused
a HOMEOTIC TRANSFORMATION of body structures (eg antenna into leg -
the antennapedia mutation). In general, HOX genes control the
formation of body pattern elements during development, particularly
along the nose-tail (anterior-posterior) axis.
In Drosophila, a set of homeotic
genes were found to cluster genetically, and when cloned and
sequenced, all proved to contain a shared 60 amino acid domain, the
homeobox. Since their discovery in flies, HOX genes have been
identified in all animals examined, but not in non-animals
(non-Metazoa) suggesting that their roles are central to "being an
animal". Note that homeodomain-containing proteins are found in many eukaryotes, including protozoa, but it is the HOXclass of homeodomain genes that is special to animals.
Comparison of these genes shows that
they can be split up into groups of orthologues. In the evolution of
mammals, the cluster has been duplicated twice, and there are four
copies (A, B, C, D) - you will look at only cluster A in this
practical.
|
species |
Drosophila melanogaster |
Mus musculus
(A cluster) |
Caenorhabditis elegans |
|
Orthologue group 1 |
labial
(lab) |
Hox-A1 |
ceh-13 |
|
Orthologue group 2 |
proboscipedia
(pb) |
Hox-A2 |
- |
|
Orthologue group 3 |
zen |
Hox-A3 |
- |
| Orthologue group 4 |
Deformed
(Dfd) |
Hox-A4 |
- |
| Orthologue group 5 |
Sex combs reduced
(Scr) |
Hox-A5 |
lin-39 |
| Orthologue group 6 |
fushi-tarazu
(ftz) |
Hox-A6 |
mab-5 |
| Orthologue group 7 |
antennapedia
(Antp) |
Hox-A7 |
egl-5 |
| Orthologue group 8 |
Ultrabithorax
(Ubx) |
- |
- |
| Orthologue group 9 |
Abdominal A
(abdA) |
Hox-A9 |
php-3 |
| Orthologue group 10 |
Abdominal B
(AbdB) |
Hox-A10 |
nob-1 |
| Orthologue group 11 |
- |
- |
- |
| Orthologue group 12 |
- |
- |
- |
| Orthologue group 13 |
- |
Hox-A13 |
- |
As you will note from the table above,
not all species have the full complement of HOX genes in their clusters:
Caenorhabditis elegans, the nematode, in particular appears to be missing many genes. In mammals,
the full complement of orthologues is present but not in one
cluster. Mammals have FOUR clusters (A, B, C and D [imaginatively]) arising from two whole genome duplications.
The questions you should attempt to
answer are as follows
1 Which chromosome is the HOX cluster on,
and how big, in kilobases of genomic DNA, is the HOX cluster in each
species?
2 How are the orthologous genes arranged
in each species? Are they in the same order, or in different
orders?
3 Are there features of
the genomic regions in and around the clusters that look unusual? For example: Are other genes are present in the
HOX cluster in each species, inbetween the HOX genes? What is the pattern of presence of repeat DNA? Are the genes (or exons, or introns...) especially big or small compared to other genes from the same species?
4 Compare and contrast the HOX clusters. What features (apart from the HOX genes themselves) are common to the clusters? What features differ between the different species?
For the Drosophila cluster,
use ENSEMBL: http://www.ensembl.org/
If the ENSEMBL site is not responding
use a mirror site: GermanyKoreaSwitzerland
Go to the "fly" link from the ENSEMBL
top page.
Search for a GENE called "lab" (the
acronym for labial) using the search interface.
Read the resulting page: is this the
correct gene (a homeodomain transcription factor)?
Follow the link to the ENSEMBL gene
corresponding to labial. Have a look at the gene's properties
to assure yourself that it is the labial gene. Perhaps follow the
links to FlyBase...
Go to the genomic sequence region
surrounding the labial gene by choosing the link from the
"Genomic location" segment of the first table.
Can you find the other HOX cluster
genes? Set the detailed view of the genomic segment to cover the HOX
genes by typing in the sequence coordinates into the "Jump to "
dialog boxes. Are all the HOX genes in this segment of the genome?
How many kilobases do they spread over? Are ther other genes
intermixed? What do they do?
You should find the first 7 genes, but
Ubx (Ultrabithorax) is not present. Where is Ubx? Use the search
interface to find Ubx, and see if the other HOX genes (abdA and AbdB)
are close. Is it on the same chromosome? How far away? Are there
other genes inbetween the HOX genes?
For the MOUSE HOX cluster, use
ENSEMBL: http://www.ensembl.org/
If the ENSEMBL site is not responding
use a mirror site: GermanyKoreaSwitzerland
Search for a GENE called "Hoxa1"using
the search interface.
Read the resulting page: is this the
correct gene (a homeodomain transcription factor)?
Follow the link to the ENSEMBL gene
corresponding to hoxa1. Have a look at the gene's properties
to assure yourself that it is the Hoxa1 gene. Perhaps follow the
links to other databases...
Go to the genomic sequence region
surrounding the hox-A1 gene by choosing the link from the
"Genomic location" segment of the first table.
Can you find the other HOX cluster
genes? Set the detailed view of the genomic segment to cover the HOX
genes by typing in the sequence coordinates into the "Jump to "
dialog boxes. Are all the HOX genes in this segment of the genome?
How many kilobases do they spread over? Are there other genes
intermixed? What do they do?
ENSEMBL displays genome information from multiple vertebrate species. If you have both time and inclination, look at the HOXA cluster in another vertebrate (platypus, anyone?)....
For the Caenorhabditis HOX
cluster, use WORMBASE: http://www.wormbase.org/
You can also use ENSEMBL:
http://www.ensembl.org/
If the ENSEMBL site is not
responding use a mirror site: Germany Korea Switzerland
Search for a GENE called "lin-39"using
the search interface.
Read the resulting page: is this the
correct gene (a homeodomain transcription factor)? Have a look at the
gene's properties/phenotypes to assure yourself that it is the lin-39 HOX gene. Perhaps follow the links to other databases...
Follow the link to the sequence
corresponding to lin-39.
Go to the genomic sequence region
surrounding lin-39 gene by choosing the link titled "Click
Here to Browse". Which chromosome is lin-39 on, and where on the
chromosome?
Can you find the other HOX cluster
genes (their names are in the table above)? Set the detailed view of the genomic segment "zoom" to show
100 kb. Are all the HOX genes in this segment of the genome? How many
kilobases do they spread over? Are there other genes intermixed? What
do they do?
If you cannot find the other HOX genes
by browsing the sequence view (see the table above) use the search
interface to find them. Where are they? How close are they to
lin-39? How many kilobases do they spread over? Are there
other genes intermixed? What do they do? (Note - dont try to count
the genes between mab-5 and php-3 - you will see why when you get
there...)
So, how do the three
"clusters" compare?
1 Are the genes clustered in
all three species? (i.e. are they grouped in one region of the
chromosome? How far apart are they?)
2 Are the genes in the same
order in all three species?
3 If there are additional
genes in the clusters, what do they do? Anything special about these
functions?
4 Is gene density round the
HOX genes the same as, or different from, the general mean gene
density in each species? (One gene per 2.5 kb in C. elegans,
one gene per 10 kb in D. melanogaster and ~ one gene per 70 kb
in M. musculus. Hint: have a look just upstream and just
downstream of the cluster/HoX genes in each species)
5 Given that it is believed
that the three clusters evolved from a common ancestor, what sorts of
evolutionary events are likely to have taken place to generate the
patterns you have found?
Genome
Databases
What is a database
an annotated and ordered set of
records that can be consulted
a repository of core
information
a substrate for research
usually kept on a
computer....
Types of databases
Flat-file Databases
like "record cards" or a
telephone directory
where each object has a set of
attributes, all noted in the same file
useful as a basic start,
but
problems with extensibility and
size of files
lack of ability to perform complex
queries
lack of complex
relationships
an example is geneDB
(http://www.genedb.org/)
(though this will soon be relational...)
Relational Databases
where data is stored in a
set of tables, with each entry in the table having some sets of
attributes
and multiple tables are linked by
explicit relationships
allows many-to-many
relationships
this permits rapid and complex
querying
and many different modes of
data
an example is WormBase
(http://www.wormbase.org/)
World Wide Web access to
databases
Most databases are "invisible"
to the casual user
interaction with the database is via
"scripted" world wide web pages
and Java tools that allow a
point-and-click interaction with the database engine
a Graphical User Interface or
GUI
The internet can be used to
seamlessly link databases so that queries can be performed on
multiple datasets in different sites
The historical problem
many genome db have "grown up"
with their communities
the naming of parts is
idiosyncratic
difficult to navigate between
databases
necessary to learn the jargon and
acronyms for each genome
... but worth the effort
Unifying approaches to link
databases
such as ENSEMBL
(http://www.ensembl.org/)
or the Comprehensive Microbial
Resource
(http://www.tigr.org/)
or databases of databases
(see http://srs.ebi.ac.uk)