MAMMALIAN GENESMAMMALIAN GENES
II. Functional Innovation and Rapid Change (Feb 10)II. Functional Innovation and Rapid Change (Feb 10)
I. Conservation and Slow Evolution (today)I. Conservation and Slow Evolution (today)
THE HUMAN GENOME SERIESTHE HUMAN GENOME SERIES
Your genome!
S L
O W
F A
S T
Feb 3 Feb 10
Questions• Are we ‘just’ E. coli, except more so?• Where do new genes come from?• Do all genes evolve at the same rate?• Do all tissues & organs evolve at the same rate?• Where do we fit in the tree of life?• What specifies the differences between us and
rodents, or us and chimps?• What specifies the elevated complexity of us versus
other animals?• Can we understand sequence variation among
humans?• How can gene function contribute to behaviour?
Theodosius Dobzhansky (1900-1975)
“Nothing in Biology makes sense except in the light of Evolution”
"Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant"
Jacque Monod (1972) 1965 Nobel laureate
• Are we ‘just’ E. coli, except more so?
~ 30k
5.4k
"Tout ce qui est vrai pour le Colibacille est
vrai pour l'éléphant ?"
Genes
Mode of Protein Evolution
• De novo creation
• Gene fusion / fission
• Gene duplication
• Rapid sequence change
• Pseudogenisation
Genomes and Timelines wrt
1 Mya
10 Mya
100 Mya
1000 Mya
Archaea 3000 Mya
Invertebrates 1000 Mya
Rodents 75 Mya
Chimpanzee 5 Mya
THE ORIGIN AND EVOLUTION OF MODEL ORGANISMS Hedges, SB Nature Reviews Genetics 3, 838 -849 (2002)
Sequencing
Assembly
DNA Repeats
Genome Comparison
Gene Prediction
Gene Comparison
Gene Number
• Walter Gilbert [1980s] 100k• Antequera & Bird [1993] 70-80k• John Quackenbush et al. (TIGR) [2000] 120k• Ewing & Green [2000] 30k• Tetraodon analysis [2001] 35k• Human Genome Project (public) [2001] ~ 31k• Human Genome Project (Celera) [2001] 24-40k• Mouse Genome Project (public) [2002] 25k -30k• Lee Rowen [2003] 25,947
Complexity & Gene Number?
0
5000
10000
15000
20000
25000
30000
35000
Human Cress Fly Worm S. pombe
Gen
e C
ou
nt
Series1
0
10000
20000
30000
40000
50000
60000
Human Cress Fly Worm S.pombe
Maize
Gen
e C
ou
nt
Series1
“Revealed: the secret of human behaviour. Environment, not genes, key to our acts”
“We simply do not have enough genes for this idea of biological determinism to be right. The wonderful diversity of the human species is not hard-wired in our genetic code. Our environments are critical.” J Craig Venter February 10, 2001
Complexity?
• Is ‘culture’ proportional to population size?
• Is the complexity of the WWW proportional to its size?
• Combinatorial argument
• Genetic interactions; alternative splicing; non-genic regulation; post-transcriptional & post-translational modifications
Complexity of Protein Sequences
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Human Fly Worm Yeast
TM
extra
intra
Architecture numbers in 4 eukaryotic proteomes
Data generated using SMART
Function
Cenancestor
SP1
SP2DP2
A1 B1 C1 C2
C1 and C2 are paraloguesA1 and B1 and (C1 and C2) are orthologues
Orthologues and Paralogues
Only 1,195 human geneswere found that had singleorthologues in worm and fly.
Approx 95% of human genesdo not have obvious orthologues in fly and worm
Data from Rich Copley and Peer Bork
Extracellular signalling proteins are among the most different between animals
Drosophila Human
C. elegans
220 119
12
Antifreeze protein type III from Antarctic eel pout (Lycodichthys dearborni)
Few sequence-based findings. For example …
[359 residues]
Human(x):Fly(1):Worm(1)
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11+
No. of human paralogues
Fre
qu
en
cy
Are we polyploid?
Richard Copley
Segmental Duplication in the Human Genome
Bailey et al. Science. 2002 297: 1003-7. Am J Hum Genet. 2003 73: 823-34
• The claim: “113 of these genes are widespread among bacteria, but, among eukaryotes, appear to be present only in vertebrates. These genes [may have] entered the vertebrate (or prevertebrate) lineage by horizontal transfer from bacteria.”
Horizontal Gene Transfer?
The coral Acropora millepora shares a surprisingly large number of genes with vertebrates.Curr Biol. 2003 Dec 16; 13(24): 2190-5.
Stanhope et al. Nature 2001 Jun 21; 411(6840): 940-4. “Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates.”
Gene loss is a powerful force in shaping gene repertoire.
"Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant“ ?
23 of 94 InterPro families: Defense and Immunitye.g. IL, interferons, defensins
17 of 94 InterPro families: Peripheral nervous systeme.g. Leptin, prion, ependymin
4 of 94 InterPro families: Bone and cartilageGLA, LINK, Calcitonin, osteopontin
3 of 94 InterPro families: LactationCaseins (), somatotropin
2 of 94 InterPro families: Vascular homeostasisNatriuretic peptide, endothelin
5 of 94 InterPro families: Dietary homeostasisGlucagon, bombesin, colipase, gastrin, IlGF-BP
18 of 94 InterPro families: Other plasma factorsUteroglobin, FN2, RNase A, GM-CSF etc.
‘New Domains’
Pseudogenes
• Two types: processed and non-processed
• 70% processed vs 30% non-processed
• ~ 20,000
Torrents et al. Genome Res. 2003 13: 2559-67.
SNPs
• Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation.
• They occur with an average density of 1/1000 nucleotides of a genotype
• Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that are believed to have the highest impact on phenotype.
• Ditto for SNPs in regulatory regions.Synonymous change: TTA (Leu) → TTG (Leu) Non-synonymous change: TTA (Leu) → TTT (Phe)
What’s the difference between a mutation and a polymorphism? Frequency!
A frequency value of 1% of the polymorphic allele is usually taken as a threshold between mutation and polymorphism.
An example of a polymorphic variant which disrupts a critical disulphide bond. Although this variant (260 Cys→Tyr) in HLA-H protein is strongly associated with hereditary haemochromatosis, its frequency is as high as 6% in Northern Europeans with up to 14% in Ireland. from Sunyaev et al. HMG 2001, Vol. 10, No. 6 591-597
Questions• Are we ‘just’ E. coli, except more so? NO.• Where do new genes come from?• Do all genes evolve at the same rate?• Do all tissues & organs evolve at the same rate?• Where do we fit in the tree of life?• What specifies the differences between us and
rodents, or us and chimps?• What specifies the elevated complexity of us versus
other animals?• Can we understand sequence variation among
humans?• How can gene function contribute to behaviour?
After the break …
Comparative Genomics:Humans vs Rodents
Human and mouse c-kit mutations show similar phenotypes. The utility of mouse as a biomedical model for human disease is enhanced when mutations in orthologous genes give similar phenotypes in both organisms. In a visually striking example of this, the same pattern of hypopigmentation is seen in (a) a patient with the piebald trait and (b) a mouse with dominant spotting, both resulting from heterozygous mutations of the c-kit proto-oncogene.
Rodents as models for human disease
• All but a handful of human genes have orthologous counterparts in the mouse and rat genomes.
• In general, disease genes are not under different selective constraints relative to all other genes.
• Rodents are good model
organisms for human disease
Mouse equivalents of human disease variants
Hs normal: MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI
Hs variant: MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI
Mm normal: MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL
Equivalent disease variants?– 23 human disease-associated sequence
variants whose variant amino acids are normal in the mouse. Including:
• Breast Cancer (BRCA1 and BRCA2)• Cystic Fibrosis (CFTR)• Type 2D LGMD (SGCA)• Becker Muscular Dystrophy (DMD)
– These variants are unlikely to be of value in understanding human disease.
Mouse vs Human
• Do all genes evolve at the same rate?• Do all tissues & organs evolve at the
same rate?• Where do we fit in the tree of life?• What specifies the differences between
us and rodents?
More organisms …
more comparisons …
~ 1000 more genes identified…
Guigó, R. et al. PNAS (2003) 100, 1140-1145
Sequence conservation
Figure 25. Sequence conservation between mouse and human genesMouse genome paper Nature 420, 520-562
Slow Evolution
The human spermidine synthase gene (SRM) and its mouse orthologue (Srm). The fifth exon in the mouse gene (green) is interrupted by an intron in the human orthologue.
Cenancestor
SP1
SP2DP2
A1 B1 C1 C2
C1 and C2 are paraloguesA1 and B1 and (C1 and C2) are orthologues
Orthologues and Paralogues
Human and mouse
“local synteny”
“Syntenic” regions contain orthologues!
Human and mouse chromosomes:global orthology
How do we link genomes & genes to evolution?
• Do all genes evolve at the same rate?• Do all tissues & organs evolve at the
same rate?• Where do we fit in the tree of life?• What specifies the differences between
us and rodents?
Domain-regions are more conserved
20% 40% 60% 80% 100%
Full Length proteins
Domain-containing regions
0%0%
5%
10%
15%
20%
25%
30%
Percentage Identity
Domain-free regions
Per
cen
tag
e o
f se
qu
ence
s p
er in
terv
al
20% 40% 60% 80% 100%
Full Length proteins
0%0%
5%
10%
15%
20%
25%
30%
Percentage Identity
Per
cen
tag
e o
f se
qu
ence
s p
er in
terv
al
Mouse-Human Orthologues % Identity
• sites not in domains: 64.4%• cSNP sites: 67.1% • all sites: 70.1%• sites in domains: 88.9%• disease sites: 90.3%
Little selection at cSNP sites
Significant selection at functional sites
A model of neutral evolution
• KS – the number of synonymous substitutions per synonymous site
• takes advantage of the redundant genetic code• 4D sites GCx (ALA), CCx (PRO), TCx (SER),
ACx (THR), CGx (ARG), GGx (GLY), CTx (LEU), GTx (VAL)
• “how much would a gene have changed if selection had not acted upon it?”
Thomas et al.,Nature 424, 788 - 793
Neutral rates vary
see alsoHardison et al.Genome Res. 2003 13: 13-26.
Variation in rates of mutation or rates of repair?
• Transcription-associated mutational strand asymmetry (Phil Green et al. Nature Genetics 33: 514-7)
• Associated with transcription-coupled repair processes (Majewski, Am J Human Genet 73, 688-692)
• Genes transcribed in the germline at high levels, when mutated, are repaired more readily, than those not transcribed in the germline.
• Majewski estimates that 71%-91% of genes are transcribed in the germline!
Tissue-specific genes’ Ks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8M
edia
n K
s-va
lue
Winter et al. Genome Research 14:54-61, 2004
A model for non-neutral evolution
• KA – the number of non-synonymous (amino acid changing) substitutions per non-synonymous site
• What proportion of possible amino acid-changing substitutions has occurred?
KA/KS (dN/dS, ω) ― A model of selective pressure
<< 1 purifying selection
> 1 positive diversifying selection
0.00.0 1.01.0
conserving diversifying
0.10 0.30 0.40 0.50 0.60 0.700%
5%
10%
15%
20%
25%
0.20
Full Length proteinsDomain-free regionsDomain-containing regions
Per
cen
tage
of
seq
uen
ces
per
inte
rval
K /KA S
0.00
Domain-regions under higher purifying selection
0.10 0.30 0.40 0.50 0.60 0.700%
5%
10%
15%
20%
25%
0.20
Full Length proteinsDomain-free regionsDomain-containing regions
Per
cen
tage
of
seq
uen
ces
per
inte
rval
K /KA S
0.00
Domain-regions are under higher purifying selection
0%
20%
40%
60%
80%
100%
0.00 0.10 0.20 0.30 0.40 0.50
Full Length proteinsDomain-free regionsDomain-containing regions
Per
cen
tag
e o
f se
qu
ence
s p
er in
terv
al
K /KA S
Higher purifying pressures in enzymes
Catalytic domains in
are
• more conserved
• under higher purifying selection
than non-catalytic domains
Selective Pressures vary with cellular compartment
For 521
domain families of known locale:
KA/KS values
• Secreted >> Nuclear > Cytoplasmic
Questions• Are we ‘just’ E. coli, except more so? NO.• Where do new genes come from? Next week.• Do all genes evolve at the same rate? NO.• Do all tissues & organs evolve at the same rate? NO.• Where do we fit in the tree of life? Mammals!• What specifies the differences between us and
rodents, or us and chimps? Next week.• What specifies the elevated complexity of us versus
other animals? Unknown.• Can we understand sequence variation among
humans? Hopefully, we will.• How can gene function contribute to behaviour? Next
week.
MRC Functional Genetics Unit, Oxford
Leo GoodstadtRichard EmesEitan WinterSteve Rice
Scott BeatsonNick Dickens
Caleb WebberMichael Elkaim
Jose Duarte
Ensembl (Ewan Briney, Michele Clamp, Abel Ureta-Vidal);Richard Copley (WTCHG, Oxford); Ziheng Yang (UCL);
The Human, Mouse and Rat Genome Sequencing Consortia; UCSC
BibliographyHuman Genome Papers:
Lander et al. Nature (2001) 409, 860-921
Venter et al. Science (2001) 291, 1304-1351.
Mouse Genome Paper:
Waterston et al. Nature (2002) 420, 520-62.
Rat Genome Paper: submitted.
Comparative genomics & evolutionary rates:
Hardison et al. Genome Res. (2003) 13, 13-26.
Adaptive evolution of genomes:
Emes et al. Hum Mol Genet. (2003) 12, 701-9
Wolfe & Li Nat Genet. (2003) 33 Suppl: 255-65