1
Genome EvolutionGenome Evolution
Dan GraurDan Graur
2
Topics:Topics:
Genome SizeGenome SizeGenome ContentGenome ContentGene GeographyGene GeographyNucleotide CompositionNucleotide Composition
3
The entire complement The entire complement of genetic material of genetic material carried by an carried by an individual is called theindividual is called the
ggenomeenome
4
Genome
Genic Non-genic
ad hoc
ad hoc
5
Transcribed UntranscribedTranscribedUntranscribed
Genome
Genic Non-genic
Transcriptome
6
TranslatedUntranslated
Transcribed UntranscribedTranscribedUntranscribed
Genome
Genic Non-genic
Proteome
7
Genome Size:Genome Size:The Anthropocentric ViewThe Anthropocentric View
8
atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacgatggactggatgccccaggaaaaggaaagaggtataaccataaccgttgcaacgaccgcatgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc
Page 1 Page 1 out of out of
1,500,0001,500,000total = ~3.5 billion bptotal = ~3.5 billion bpNo index… No index…
No annotation… No annotation… No explanation… No explanation… Only a, c, t, & g, Only a, c, t, & g, ad nauseamad nauseam… …
9
1-2 trillion cells1-2 trillion cells
2 2 23 = 46 chromosomes 23 = 46 chromosomes
The human haploid genome is 3.5 The human haploid genome is 3.5 10 1099 bp. bp.
10
On the average, a single human chromosome consists of 5 cm of DNA.A human cell contains 2-3 meters of DNA.The total length of DNA in one adult human is 2.0 × 1013 meters.
1400 nm
700 nm300 nm
Condensedchromosome
Condensed chromatin
Extended chromatin
Nucleosomes
DNA double helix
Packed nucleosomes
2 nm
Histone11 nm
DNA wound arounda cluster of histonemolecules
30 nm
Scaffoldingprotein
30 nmfiber
The marvels of The marvels of packagingpackaging
12
The total length of DNA in an adult human is 2.0 × 1013 meters (the equivalent of 70 trips from Earth to the Sun and back, or 5 trips from the Sun to Neptune and back).
With the right With the right public relationspublic relations you can make you can make the genome look the genome look bibigg……
13
3.5 billion letters in a four-letter alphabet
1 CD
<
Information content
… … or or smallsmall..
1 CD
14
Human chromosome 22
48,000,000 bp
December 1999
15
Does the human genome size Does the human genome size reflect the fact that we are the reflect the fact that we are the pinnaclepinnacle of creationof creation??
16
17
How to lie with How to lie with ssttaattiissttiiccss
The case of the The case of the missing axis.missing axis.
18
1. Chromosome number2. DNA length3. Number of genes
Measures of genome Measures of genome sizesize
19
1.1.
20
1 10 100 1,000 10,000
max
minmean
Logarithmic scale4 orders of magnitude
21
Human karyotype = 46 chromosomes
22
1 10 100 1,000 10,000
Myrmecia pilosula (males)(1)
Jumping jack
23
Haplopappus gracilis (4)
1 10 100 1,000 10,000
Yellow spiny daisy
24
Pisum sativum (14)
1 10 100 1,000 10,000
25
Helianthus annuus (34)
1 10 100 1,000 10,000
Sunflower
26
Felis catus (38)
1 10 100 1,000 10,000
27
Homo sapiens(46)
1 10 100 1,000 10,000
Canis familiaris(78)
28
1 10 100 1,000 10,000
Tympanoctomys barrerae(102)
Red viscacha rat
29
Senecio roberti-friesii(90)
1 10 100 1,000 10,000
Robert & Friesi’s groundsel(belongs to the daisy family)
yellow spiny daisy (4)
30
1 10 100 1,000 10,000
Lysandra atlantica(250)
Atlantic Adonis blue
31
1 10 100 1,000 10,000
Ophioglossum reticulatum(~1260)
……and we are only here.
and we are only here.
Netted adder's-tongue (a fern)
32
KK-value paradox: Complexity -value paradox: Complexity does not correlate with does not correlate with chromosome numberchromosome number..
46 250
Ophioglossum reticulatumHomo sapiens Lysandra atlantica
~1260
33
2.2.
34
105 106 107 108 109 1010 1011 1012
largest
smallest
mean
DNA length (bp)
Logarithmic scale7 orders of magnitude
35
105 106 107 108 109 1010 1011 1012
Carsonella ruddii
DNA length (bp)
An endosymbiont of psyliids, which parasitize hackberry. The smallest known genome of any free-living organism.
36
105 106 107 108 109 1010 1011 1012
Plasmodium falciparum
DNA length (bp)
The human malaria parasite.
37
105 106 107 108 109 1010 1011 1012
Tetrodon fluviatilis
DNA length (bp)
Green-spotted pufferfish
38
105 106 107 108 109 1010 1011 1012
Miniopterusschreibersii
DNA length (bp)
Schreiber's long-wing bat
39
105 106 107 108 109 1010 1011 1012
Homo sapiens
DNA length (bp)
40
Great crested newt
105 106 107 108 109 1010 1011 1012
Triturus cristatus
DNA length (bp)
41
105 106 107 108 109 1010 1011 1012
Ophioglossum petiolatum
DNA length (bp)
Stalked adder's tongue (fern)
42
105 106 107 108 109 1010 1011 1012
200 times morethan me?
Amoeba dubia
DNA length (bp)
43
44
105 106 107 108 109 1010 1011 1012
DNA length (bp)
45
Crepis laciniata Cuminum cyminum
Blatta orientalis
Papaver tauricolaUca pugilator
Homo sapiens
Salvelinus fontinalis
46
CC-value paradox: Complexity -value paradox: Complexity does not correlate with does not correlate with ggenome sizeenome size..
3.4 109 bpHomo sapiens
6.8 1011 bpAmoeba dubia
1.5 1010 bpAllium cepa
47
3.3.
48
It is very difficult to estimate It is very difficult to estimate accuratelaccurately the number of protein-y the number of protein-coding genes in the genome of coding genes in the genome of eukaryotes.eukaryotes.
Reason 1: the large number and Reason 1: the large number and large size of the introns.large size of the introns.
Reason 2: the low density of Reason 2: the low density of genes.genes.
49
Example 1: factor-IX geneExample 1: factor-IX gene
Only about 4%4% of the sequence actually encode the protein.
50
Dystrophin has 79 exons and spans over 2.4 million base pairs of DNA..
Example 2: dystrophin geneExample 2: dystrophin gene
Only about 0.3%0.3% of the sequence encodes the protein.
51
atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacgatggactggatgccccaggaaaaggaaagaggtataaccataaccgttgcaacgaccgcatgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc
52
atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacgatggactggatgccccaggaaaaggaaagaggtataaccataaccgttgcaacgaccgcatgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc
1
53
gatggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggaatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagtt
2
54
ttgatggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaag
3
55
atggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttc
4
56
gagagaggtcctataagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttccctttga
5
57
tggcgagagaggtgcctatagagaaattgagaaacataggtatagttgctcacattgacgcgggtaaaactacgactaccgagagaattctctattacacgggtaagacttacaagataggtgaagttcacgaaggtgctgcaacggaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgtgttattggacgagaaaattccccttctcaggaggctgtgggaattcagggtggataaccccgaagagttccagtcaggtcaacagctcaaagtggaagacgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcccgaaaagtacgtgggtgacgttataggtgaccttaactccagaagaggaaagattatgggaatggaaaacaagggagttataacagtcataaaggctcacgttcccctcgcagagatgttcggatacgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaagttcgctacgacgctcaggagcttgacacaaggtaggggaacctttataatgaaattttcccactacgacgaa
6
58
TheThe DDNNAA asas text...text... TheThe DDNNAA asas text...text...
59
nnsnnhumbmjfdiooospfptyrewnzxcmopleprotiuwwrqdngjklsmsnabmjfdioobnmppoewqasdtratyusisosmamgkkpsretwyospfptyrewnzxcmopleprotiuwwrqdngjklsmsnakeytivvkvldtpgppvnvtvkeiskdsayvtweppiidggspiinyvvqkrdaerkswstvttecsktsfrvanleegksyffrvfaeneygigdpgetrdavkasqtpgpvvdlkvrsvsksscsigwkkphsdggsriigyvvdflteenkwqrvmkslslqysakdltegkeytfrvsaenengegtpseitvvarddvvapdldlkglpdlcylakensnfrlkipikgkpapsvswkkgedplatdtrvsvessavnttlivydcqksdagkytitlknvagtkegtisikvvgkpgiptgouqxbzzzzzpikfdevtaeamtlkwappkddggseitnyilekrdsvnnkwvtcasavqkttfrvtrlhegmeytfrvptydumsaenkygvgeglksepivarhpfdvpdappppnivdvrhdsvsltwtdpkktggspitgyhlefkernsllwkranktpirmrdfkvtgltegleyefrvmain1lagvgkpslpsepvvaldpidppgkpevinitrnsvtliwtepkydgghkltgyivekrdlpskswmkanhvnvpecaftvtdlveggkyefrirakntagaisapsestetiickdeyeaptivldptikdgltikagdtivlnaisilgkplpksswskagkdirpsditqitstptssmltikyatrkdageytitatnpfgtkvehvkvtvldvpgppgpveisnvsaekatltwtppledggspiksyilekretsrllwtvvsediqscrhvatkliqgneyifrvsavnhygkgepvqsepvkmvdrfgppgppekpevsnvtkntatvswkrpvddggseitgyhverrekkslrwvraiktpvsdlrckvtglqegstyefrvsaenragigptysappseasdsvlmkdaayppgppsnphvtdttkksaslawgkphydggleitgyvvehqkvgdeawikdttgtalritqfvvpdlqtkekynfrisaindagvgepavipdveiveremapdfeldaelrrtlvvraglsirifvpikgrpapevtonawatwtkdninlknranientesftlliipecnrydtgkfvmtienpagkksgfvnvrvldtpghjiuopzxnllmt
60
nnsnnhumbmjfdiooospfptyrewnzxcmopleprotiuwwrqdngjklsmsnabmjfdioobnmppoewqasdtratyusisosmamgkkpsretwyospfptyrewnzxcmopleprotiuwwrqdngjklsmsnakeytivvkvldtpgppvnvtvkeiskdsayvtweppiidggspiinyvvqkrdaerkswstvttecsktsfrvanleegksyffrvfaeneygigdpgetrdavkasqtpgpvvdlkvrsvsksscsigwkkphsdggsriigyvvdflteenkwqrvmkslslqysakdltegkeytfrvsaenengegtpseitvvarddvvapdldlkglpdlcylakensnfrlkipikgkpapsvswkkgedplatdtrvsvessavnttlivydcqksdagkytitlknvagtkegtisikvvgkpgiptgouqxbzzzzzpikfdevtaeamtlkwappkddggseitnyilekrdsvnnkwvtcasavqkttfrvtrlhegmeytfrvptydumsaenkygvgeglksepivarhpfdvpdappppnivdvrhdsvsltwtdpkktggspitgyhlefkernsllwkranktpirmrdfkvtgltegleyefrvmain1lagvgkpslpsepvvaldpidppgkpevinitrnsvtliwtepkydgghkltgyivekrdlpskswmkanhvnvpecaftvtdlveggkyefrirakntagaisapsestetiickdeyeaptivldptikdgltikagdtivlnaisilgkplpksswskagkdirpsditqitstpptytssmltikyatrkdageytitatnpfgtkvehvkvtvldvpgppgpveisnvsaekatltwtppledggspiksyilekretsrllwtvvsediqscrhvatklisaqgneyifrvsavnhygkgepvqsepvkmvdrfgppgppekpevsnvtkntatvswkrpvddggseitgyhverrekkslrwvraiktpvsdlrckvtglqegstyefrvsaenragigppseasdsatonawavlmkdaayppgppsnphvtdttkksaslawgkphydggleitgyvvehqkvgdeawikdttgtalritqfvvpdlqtkekynfrisaindagvgepavipdveiveremapdfeldaelrrtlvvraglsirifvpikgrpapevtwtkdninlknranientesftlliipecnrydtgkfvmtienpagkksgfvnvrvldtpghjiuopzxnllm
humpty dumpty sat on a wall...
61
It is very difficult to estimate It is very difficult to estimate accuratelaccurately the number of protein-y the number of protein-coding genes in the genome of coding genes in the genome of eukaryotes.eukaryotes.
Reason 1: the large number and Reason 1: the large number and large size of the introns.large size of the introns.
Reason 2: the low density of Reason 2: the low density of genes.genes.
62
From 23 genes per million base pairs on From 23 genes per million base pairs on chromosome 19 (chromosome 19 (3%3%) to only 5 genes per ) to only 5 genes per million base pairs on chromosome 13 (million base pairs on chromosome 13 (0.7%0.7%).).
There are gene-dense (urban centers) and There are gene-dense (urban centers) and gene-poor (deserts) chromosomesgene-poor (deserts) chromosomes
63
64
Gene Numbers:Gene Numbers:
Pre-draft and post-Pre-draft and post-draft predictions.draft predictions.
65
66
Two months laterTwo months later
Correction:Correction:Nature Genet. 25, 239– 240 (2000)Nature Genet. 25, 239– 240 (2000)
““These improved estimates provide a lower bound of These improved estimates provide a lower bound of 56,960 and an upper bound of 81,273 genes in the 56,960 and an upper bound of 81,273 genes in the human genome.”human genome.”
67
1515February February
20012001
1st draft
68
69
July 2000
Bets: 165Mean: 61,710Lowest: 27,462Highest: 153,478
Bets: 281 Median: 61,302 Lowest: 27,462 Highest: 212,278
July 2001
The gene number game: The gene number game:
GenesweepGenesweep©©
70
finished sequence
21 21 October October 20042004
71
Ensembl (October 2004): Ensembl (October 2004):
20,13420,134 protein-coding genesprotein-coding genes
72
Genebuild last updated: October 2008 Known protein-coding genes: 21,343Novel protein-coding genes: 73Pseudogenes: 9,899RNA-specifying genes: 5,732Exons: 297,252RNA transcripts: 62,877SNPs: 15,040,632
73
NN-value paradox: Complexity -value paradox: Complexity does not correlate with does not correlate with protein-coding gprotein-coding gene numberene number..
~25,000 genes~25,000 genes ~25,000 genes~25,000 genes ~60,000 genes~60,000 genes
74
Summary: Summary: 3 genomic paradoxes3 genomic paradoxes
KN
C
75
Lack of correspondence between Lack of correspondence between measures of gmeasures of genome sizeenome size and the and the presumed amount of genetic presumed amount of genetic information “needed” by the information “needed” by the organism (its “organism (its “comcompplexitlexity”).y”).
Genomic paradox:Genomic paradox:
76
What is complexity?What is complexity?
77
959 cells959 cells 1,031 cells1,031 cells
19,000 genes19,000 genes 13,600 genes13,600 genes~10~1088 cells cells
78
If humans are the indeed the pinnacle of If humans are the indeed the pinnacle of creation, they should have the creation, they should have the bibiggggestest, , larlarggestest, , tallesttallest, & , & fattestfattest Texas-sizeTexas-size genome. genome.
They don’t, so they aren’t.They don’t, so they aren’t.
79
The human genome is The human genome is disappointing…disappointing…
80
The human genome is:The human genome is:
•smallsmall•emptyempty•repetitiverepetitive•unoriginalunoriginal•inelegantinelegant