BaNG - Blaxter Nematode and Neglected Genomics

Genomes & Genomics
Mark Blaxter's Teaching WebSite

  at the Institute of Evolutionary Biology, University of Edinburgh
Courses:
Honours:
 

Introduction to Caenorhabditis elegans

Introduction to Phylogenetics

Genome Sequencing and Annotation for Informatics MSc

MSc in Bioinformatics

The BTO

 

TechSession8: Genome Databases

Mark Blaxter


In this tech session, you will use the genome databases to perform a comparative analysis of a conserved cluster of genes in three genomes: mouse, Drosophila (fly), and Caenorhabditis (worm).

 fly mouse nematode

The genes you will be examining are the HOX genes.

See http://pharyngula.org/index/weblog/comments/a_brief_overview_of_hox_genes/ for a brief introduction to HOX genes, focussing on their role in Drosophila.

These genes encode proteins that contain a homeobox, a DNA-binding domain found in many transcription factors. See http://www.biosci.ki.se/groups/tbu/homeo.html for a general introduction to homeobox genes. The HOX genes are a special class of metazoan homeodomain proteins that were first identified genetically in Drosophila by mutations that caused a HOMEOTIC TRANSFORMATION of body structures (eg antenna into leg - the antennapedia mutation). In general, HOX genes control the formation of body pattern elements during development, particularly along the nose-tail (anterior-posterior) axis.

In Drosophila, a set of homeotic genes were found to cluster genetically, and when cloned and sequenced, all proved to contain a shared 60 amino acid domain, the homeobox. Since their discovery in flies, HOX genes have been identified in all animals examined, but not in non-animals (non-Metazoa) suggesting that their roles are central to "being an animal". Note that homeodomain-containing proteins are found in many eukaryotes, including protozoa, but it is the HOXclass of homeodomain genes that is special to animals.

Comparison of these genes shows that they can be split up into groups of orthologues. In the evolution of mammals, the cluster has been duplicated twice, and there are four copies (A, B, C, D) - you will look at only cluster A in this practical.

species

Drosophila melanogaster

Mus musculus
(A cluster)

Caenorhabditis elegans

Orthologue group 1

labial (lab)

Hox-A1

ceh-13

Orthologue group 2

proboscipedia (pb)

Hox-A2

-

Orthologue group 3

zen

Hox-A3

-

Orthologue group 4
Deformed (Dfd)
Hox-A4
-
Orthologue group 5
Sex combs reduced (Scr)
Hox-A5
lin-39
Orthologue group 6
fushi-tarazu (ftz)
Hox-A6
mab-5
Orthologue group 7
antennapedia (Antp)
Hox-A7
egl-5
Orthologue group 8
Ultrabithorax (Ubx)
-
-
Orthologue group 9
Abdominal A (abdA)
Hox-A9
php-3
Orthologue group 10
Abdominal B (AbdB)
Hox-A10
nob-1
Orthologue group 11
-
-
-
Orthologue group 12
-
-
-
Orthologue group 13
-
Hox-A13
-

 

As you will note from the table above, not all species have the full complement of HOX genes in their clusters: Caenorhabditis elegans, the nematode, in particular appears to be missing many genes. In mammals, the full complement of orthologues is present but not in one cluster. Mammals have FOUR clusters (A, B, C and D [imaginatively]) arising from two whole genome duplications.

The questions you should attempt to answer are as follows

1 Which chromosome is the HOX cluster on, and how big, in kilobases of genomic DNA, is the HOX cluster in each species?

2 How are the orthologous genes arranged in each species? Are they in the same order, or in different orders?

3 Are there features of the genomic regions in and around the clusters that look unusual? For example: Are other genes are present in the HOX cluster in each species, inbetween the HOX genes? What is the pattern of presence of repeat DNA? Are the genes (or exons, or introns...) especially big or small compared to other genes from the same species?

4 Compare and contrast the HOX clusters. What features (apart from the HOX genes themselves) are common to the clusters? What features differ between the different species?


For the Drosophila cluster, use ENSEMBL: http://www.ensembl.org/

If the ENSEMBL site is not responding use a mirror site: GermanyKoreaSwitzerland

Go to the "fly" link from the ENSEMBL top page.

Search for a GENE called "lab" (the acronym for labial) using the search interface.

Read the resulting page: is this the correct gene (a homeodomain transcription factor)?

Follow the link to the ENSEMBL gene corresponding to labial. Have a look at the gene's properties to assure yourself that it is the labial gene. Perhaps follow the links to FlyBase...

Go to the genomic sequence region surrounding the labial gene by choosing the link from the "Genomic location" segment of the first table.

Can you find the other HOX cluster genes? Set the detailed view of the genomic segment to cover the HOX genes by typing in the sequence coordinates into the "Jump to " dialog boxes. Are all the HOX genes in this segment of the genome? How many kilobases do they spread over? Are ther other genes intermixed? What do they do?

You should find the first 7 genes, but Ubx (Ultrabithorax) is not present. Where is Ubx? Use the search interface to find Ubx, and see if the other HOX genes (abdA and AbdB) are close. Is it on the same chromosome? How far away? Are there other genes inbetween the HOX genes?


For the MOUSE HOX cluster, use ENSEMBL: http://www.ensembl.org/

If the ENSEMBL site is not responding use a mirror site: GermanyKoreaSwitzerland

Search for a GENE called "Hoxa1"using the search interface.

Read the resulting page: is this the correct gene (a homeodomain transcription factor)?

Follow the link to the ENSEMBL gene corresponding to hoxa1. Have a look at the gene's properties to assure yourself that it is the Hoxa1 gene. Perhaps follow the links to other databases...

Go to the genomic sequence region surrounding the hox-A1 gene by choosing the link from the "Genomic location" segment of the first table.

Can you find the other HOX cluster genes? Set the detailed view of the genomic segment to cover the HOX genes by typing in the sequence coordinates into the "Jump to " dialog boxes. Are all the HOX genes in this segment of the genome? How many kilobases do they spread over? Are there other genes intermixed? What do they do?

ENSEMBL displays genome information from multiple vertebrate species. If you have both time and inclination, look at the HOXA cluster in another vertebrate (platypus, anyone?)....


For the Caenorhabditis HOX cluster, use WORMBASE: http://www.wormbase.org/

You can also use ENSEMBL: http://www.ensembl.org/

If the ENSEMBL site is not responding use a mirror site: Germany Korea Switzerland

Search for a GENE called "lin-39"using the search interface.

Read the resulting page: is this the correct gene (a homeodomain transcription factor)? Have a look at the gene's properties/phenotypes to assure yourself that it is the lin-39 HOX gene. Perhaps follow the links to other databases...

Follow the link to the sequence corresponding to lin-39.

Go to the genomic sequence region surrounding lin-39 gene by choosing the link titled "Click Here to Browse". Which chromosome is lin-39 on, and where on the chromosome?

Can you find the other HOX cluster genes (their names are in the table above)? Set the detailed view of the genomic segment "zoom" to show 100 kb. Are all the HOX genes in this segment of the genome? How many kilobases do they spread over? Are there other genes intermixed? What do they do?

If you cannot find the other HOX genes by browsing the sequence view (see the table above) use the search interface to find them. Where are they? How close are they to lin-39? How many kilobases do they spread over? Are there other genes intermixed? What do they do? (Note - dont try to count the genes between mab-5 and php-3 - you will see why when you get there...)


So, how do the three "clusters" compare?

1 Are the genes clustered in all three species? (i.e. are they grouped in one region of the chromosome? How far apart are they?)

2 Are the genes in the same order in all three species?

3 If there are additional genes in the clusters, what do they do? Anything special about these functions?

4 Is gene density round the HOX genes the same as, or different from, the general mean gene density in each species? (One gene per 2.5 kb in C. elegans, one gene per 10 kb in D. melanogaster and ~ one gene per 70 kb in M. musculus. Hint: have a look just upstream and just downstream of the cluster/HoX genes in each species)

5 Given that it is believed that the three clusters evolved from a common ancestor, what sorts of evolutionary events are likely to have taken place to generate the patterns you have found?


 Genome Databases

What is a database

an annotated and ordered set of records that can be consulted

a repository of core information

a substrate for research

usually kept on a computer....

Types of databases

Flat-file Databases
like "record cards" or a telephone directory

where each object has a set of attributes, all noted in the same file

useful as a basic start, but

problems with extensibility and size of files

lack of ability to perform complex queries

lack of complex relationships

an example is geneDB (http://www.genedb.org/) (though this will soon be relational...)

Relational Databases

where data is stored in a set of tables, with each entry in the table having some sets of attributes

and multiple tables are linked by explicit relationships

allows many-to-many relationships

this permits rapid and complex querying

and many different modes of data

an example is WormBase (http://www.wormbase.org/)

World Wide Web access to databases

Most databases are "invisible" to the casual user

interaction with the database is via "scripted" world wide web pages

and Java tools that allow a point-and-click interaction with the database engine

a Graphical User Interface or GUI

The internet can be used to seamlessly link databases so that queries can be performed on multiple datasets in different sites

The historical problem

many genome db have "grown up" with their communities

the naming of parts is idiosyncratic

difficult to navigate between databases

necessary to learn the jargon and acronyms for each genome

... but worth the effort

Unifying approaches to link databases

such as ENSEMBL

(http://www.ensembl.org/)

or the Comprehensive Microbial Resource

(http://www.tigr.org/)

or databases of databases

(see http://srs.ebi.ac.uk)

 

 

 

 

 

 

 

 

 

 

 

the content of these pages is copyright Mark Blaxter and colleagues. Contact the webmaster if there are problems.