eukaryotic genomes: from parasites to primates (part 2 of 2) monday, november 3, 2003 introduction...

31
Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner [email protected]

Upload: candace-ramsey

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Eukaryotic Genomes:From Parasites to Primates

(part 2 of 2)

Monday, November 3, 2003

Introduction to BioinformaticsME:440.714J. Pevsner

[email protected]

Page 2: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by Wiley.

These images and materials may not be usedwithout permission from the publisher.

Visit http://www.bioinfbook.org

Copyright notice

Page 3: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: Introduction

We will next survey eukaryotic genomes. Basic issues are:

-- description of complete sequence of the chromosomes-- annotation of the DNA to characterize noncoding DNA-- annotation to identify protein-coding genes

-- chromosome structure-- comparative genomics analyses-- molecular evolution-- relation of genotype to phenotype-- disease relevance

Page 567

Page 4: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: Introduction

We will explore the eukaryotic tree of Baldauf et al. (2000)moving from the bottom upwards.

Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF (2000). A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290(5493), 972-977.

Page 567

Page 5: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu
Page 6: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu
Page 7: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu
Page 8: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu
Page 9: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: Protozoans at the base of the tree

Giardia lamblia is a water-borne parasiteDisease relevance: giardiasis (causes diarrhea)Distinguishing features: lack of mitochondria, peroxisomesGenome size: 12 MbChromosomes: 5 (range 0.7 to >3 Mb)Website: http://www.mbl.edu/Giardia (sequencing in progress)

The genome has just three retrotransposons.Also, it appears to have a single intron (ferredoxin gene).

Page 570

Page 10: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: trypanosomes and Leishmania

Page 571

Trypanosoma brucei causes sleeping sickness (Africa)Trypanosoma cruzi causes Chagas’ disease (S. America)Distinguishing features: transmitted by tsetse fliesGenome size: 35 Mb (+/- 25% in various isolates)Chromosomes: 11 (range 1 to >6 Mb); also has intermediate

chromosomes and 100 linear minichromosomesWebsite: http://parsun1.path.cam.ac.uk

Trypanosomes have kinetoplast DNA (circular rings ofmitochondrial DNA)(studied by Paul Englund’s lab here).

Page 11: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: trypanosomes and Leishmania

Page 571

Leishmania major causes leishmaniasisGenome size: 34 Mb Chromosomes: 36 (range 0.3 to 2.5 Mb)Genes: about 9800Website: http://www.sanger.ac.uk/Projects/L_major/

Leishmania chromosome 1 has 79 protein-coding genes.The first 29 (from the left telomere) are all transcribed fromone strand, and the next 50 from the opposite strand.

Page 12: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: malaria parasite Plasmodium falciparum

Page 573

Plasmodium falciparum causes malaria, killing 2.7 millionpeople each year.

Distinguishing features: Four Plasmodium species infecthumans: P. falciparum, P. vivax, P. ovale, P. malariae.The life cycle is extremely complex.

Genome size: 22.8 Mb Chromosomes: 14 (range 0.6 to 3.3 Mb)Genes: 5268 (comparable to S. pombe)(1 gene/4300 bp)Website: http://www.plasmodb.org

P. falciparum has an adenine+thymine (AT) content of 80.6%.The P. yoelli yoelli genome was also sequenced (infects rats).

Page 13: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: malaria parasite Plasmodium falciparum

Page 573

Bioinformatics approaches to Plasmodium falciparum:

-- The apicoplast (relic plastid; fatty acid, isoprene metabolism) is a potential drug target. Apicoplast signal sequences found.-- Comparative genomics defines some gene functions, identifies genes lacking in closely related species-- Genes implicated in antigenic variation and immune system evasion can be identified (e.g. 1000 copies of vir)-- Proteomics applied to four stages of the life cycle (sporozoites, merozoites, trophozoites, gametocytes)-- Atypical metabolic pathways may be exploited, e.g. use of 1-deoxy-D-xylulose 5-phosphate (DOXP) in isoprene biosynthesis.

Page 14: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: overview of plants

• Plants for a distinct clade in the eukaryotic tree• All plants are multicellular• Plants are sessile, and depend of photosynthesis (Epifagus is an exception)• Plants originated about 1.5 billion years ago (BYA), after eukaryotes had acquired a mitochondrion by endosymbiosis. Plants acquired a plastid (i.e. the chloroplast) over 1 BYA.

Page 575

Page 15: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Figure 16.22Page 575

After Myerowitz (2002)and Wang et al. (1999)

Page 16: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: overview of plants

Eudicots (e.g. Arabidopsis) diverged from monocots(e.g. rice) about 200 million years ago (MYA).

Dicots include rosids (Arabidopsis, Glycine max [soybean],M. trunculata) and asterids (e.g. Lycopersiocon esculentum[tomato]).

Monocots include cereals (seeds of flowering plants fromthe grass family).

Page 578

Page 17: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Figure 16.23Page 577

Page 18: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: Arabidopsis thaliana

Page 578

A. thaliana is a thale cress, sometimes called a weed.Distinguishing features: Rapid growth rate, extensive genetics.

Member of the Brassicaceae (mustard) family.A flowering plant (emerged 200 MYA).

Genome size: 125 Mb (very small for a plant genome). Wheat is 16.5 Gb, barley is 5 Gb. Chromosomes: 5Genes: 25,498 (comparable to human)Website: http://www.arabidopsis.org

--The entire Arabidopsis genome may have duplicated twice. -- 24 duplicated segments of > 100 kilobases

Page 19: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Fig. 16.25Page 580

The TAIR web browser forArabidopsis

Page 20: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: rice

Page 579

Oryza sativa is rice (subspecies indica, japonica).Distinguishing features: This crop is a staple for half the world’s population. Four groups generated draft versions.Genome size: 430 Mb (1/8th of human genome). One of the smallest grass genomes. Chromosomes: 12Genes: about 50,000? (more than human)Website: http://www.usricegenome.org (and other sites)

--The rice genome displays an unusual gradient in GC content. The mean is 43%. The 5’ end of most genes has a higher GC content than the 3’ end (by 25%). GC-rich regions occur selectively in exons (not introns).

Page 21: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: overview of the metazoans

The metazoans are animals including worms, insects,and vertebrates (e.g. fish and primates).

Page 582

Page 22: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: the slime mold Dictyostelium discoideum

Page 582

Dictyostelium discoideum is a slime mold. This forms anoutgroup to the metazoans.

Distinguishing features: The remarkable life cycle includessingle-cell and multicellular forms.

Genome size: 34 MbChromosomes: 6Genes: about 11,000Website: http://dictybase.org

--The Dicty genome has almost 80% AT content (similar to Plasmodium). Thus a whole-chromosome shotgun strategy was employed.

Page 23: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: the nematode C. elegans

Page 584

C. elegans is a free-living soil nematode.Distinguishing features: Its genome was the first of a multi- cellular animal to be sequenced (1998).Genome size: 97 MbChromosomes: 6Genes: about 19,000 (spanning 27% of genome)Website: http://www.wormbase.org

--Many worm functional genomics projects have been performed, such as microarrays at multiple developmental stages.

Page 24: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: the fruitfly Drosophila

Page 585

Drosophila’s distinguishing features: Short lifecycle, varied phenotypes, model organism in genetics.Genome size: 180 MbChromosomes: 5Genes: about 13,000 (spanning 27% of genome)Website: http://www.fruitfly.org

--At the time, largest genome for which whole genome shotgun sequencing was applied.--Each genome annotation improves the gene models

Page 25: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

This is Ann: the mosquito Anopheles gambiae

Page 587

A. gambiae was the second insect genome sequenced.Distinguishing features: It is the malaria parasite vector.Genome size: 278 Mb (twice the size of Drosophila)Chromosomes: 3Genes: about 14,000 Website: http://www.ensembl.org/Anopheles_gambiae/

--Diverged from Drosophila 250 MYA (average amino acid sequence identity of orthologs is 56%). Compare human and pufferfish (diverged 400 MYA, 61% identity): insect proteins diverge at a faster rate.--High degree of genetic variation

Page 26: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: the sea squirt Ciona intestinalis

The chordates include vertebrates (fish, amphibians, reptiles, birds, mammals) which have a spinal column.

Some chordates an invertebrates, such as the sea squirt.

Genomes size: 160 Mb (20 times smaller than human)Chromosomes: 14Genes: 15,852

Significant for our understanding of vertebrate evolution.

Page 587

Page 27: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: the fish Fugu rubripes

Page 588

Fugu is a pufferfish (also called Takifugu rubripes).Distinguishing features: Diverged from humans 450 MYA; has comparable number of genes in a compact genome.Genome size: 365 Mb (1/10th human genome)Genes: about 30,000Website: http://genome.jgi-psf.org/fugu6/fugu6.info.html

--Only 2.7% of genome is interspersed repeats (compare 45% in human), based on RepeatMasker.--Introns are relatively short. 75% of Fugu introns are <425 base pairs (for human, 75% are <2609 base pairs).

Page 28: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: the mouse Mus musculus

Page 589

M. musculus is the second mammal to have its genome sequenced. Mouse diverged from human 75 MYA.Distinguishing features: only 300 of 30,000 annotated genes have no human orthologsGenome size: 2.5 Gb (euchromatic portion)(cf. 2.9 Gb human)Chromosomes: 6Genes: about 30,000Website: http://www.informatics.jax.org

--Dozens of mouse-specific expansions occurred, such as olfactory receptor gene family.--40% of mouse genome can be aligned to human genome at the nucleotide level.

Page 29: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Individual eukaryotic genomes: primates

Page 591

The phylogenetic tree shows that chimpanzee (Pantroglodytes) and bonobo (pygmy chimpanzee, Panpaniscus) are the two species most closely relatedto humans. These three species diverged from acommon ancestor about 5.4 million years ago, basedon an analysis of 36 nuclear genes.

Large-scale genome sequencing projects have begun forthe chimpanzee. Other genomes under consideration arethe rhesus macaque monkey (Macaca mulatta) and theolive baboon (Papio hamadryas anubis).

Page 30: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu
Page 31: Eukaryotic Genomes: From Parasites to Primates (part 2 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

Perspective and pitfalls

Page 531

One of the broadest goals of biology is to understand thenature of each species: what are its mechanisms of development, metabolism, homeostasis, reproduction,and behavior? Sequencing a genome does not answerthese questions directly. After genome annotation, wetry to interpret the function of the genome’s constituentsin the context of various physiological processes.

The field of bioinformatics needs continued developmentof algorithms to find genes, repetitive sequences, genomeduplications and other features, as well as tools to identifyconserved regions. We may then generate and testhypotheses about genome function.