l14 human genome
Post on 11-May-2015
603 Views
Preview:
TRANSCRIPT
Human Genome Project
Determine the entire sequence of the human genome.
3 billion base pairs
Problem:
It’s really big!
Genome Sequencing
As of 6/ 25/ 04
1128 genome projects:
199 complete (includes 28 eukaryotes)
508 prokaryotic genomes in progress
421 eukaryotic genomes in progress
smallest: archaebacterium Nanoarchaeum equitans 500 kb
Bacillus anthracis (anthrax) 5228 kb
S. cerivisiae (yeast) 12,069 kb
Arabidopsis thaliana 115,428 kb
Drosophila melanogaster (fruit fly) 137,000 kb
Anopheles gambiae (malaria mosquito) 278,000 kb
Oryza sativa (rice) 420,000 kb
Mus musculus (mouse) 2,493,000 kb
Homo sapiens (human) 2,900,000 kb
http:// www. genomesonline. org/
1980 - $10/bp2001 - $0.1/bp
S. cerevisiae200xH. sapiens200xA. dubia
Human Genome Project timeline
E. coli Drosophila
C. elegans
Yeast
NRCRecommends
HGP
U.S.HGP
Begins
1990 1995 2000
Human Gene Map(16,000 genes)
Human Gene Map(30,181 genes)
Goal for Human Genetic Map
Exceeded
Physical MapCovers 98% of
Genome
Pilot Human Sequencing
Begins
Full-Scale Human Sequencing
Begins
Human draft
Phil Hieter
Completion of the genome
GenBank entries
double every 18 months
“Working Draft”
“Complete”4-5 coverage
9x coverage99.99 % acc
Completion of the genome
The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases.Human genome seems to encode only 20,000-25,000 protein-coding genes
International Human Genome Sequencing Consortium.Finishing the euchromatic sequence of the human genome.Nature 2004 Oct 21;431(7011):931-45.
Institutes that produced 85 % of the sequence
1.Whitehead Institute for Biomedical Research, Center for Genome Research, Cambridge, MA2. The Sanger Centre, Cambridge, UK3. Washington University Genome Sequencing Center, St. Louis, MI4. US Department of Energy, JGI, Walnut Creek, CA5. Baylor College of Medicine Human Genome Sequencing Center, Houston, TX
Countries: USA, UK, Japan, Germany, China, France
Genome SequencingGenome: 3 Gb
Cut genome into large pieces
Clone into BACs: 100 kb
Order based on sequence features (markers) = mapping
Cut again
Sequence AGAACAGGACGTATGTGGTTGTGGTTTTCTACTCC
CTACTCCTGTGTT
TTGTAAGTGAGAACAAssemble each BAC
…TTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTT…
Assemble entire sequence
What does the sequence mean?TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCCAGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATGTATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGTCATATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTTTATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTAGGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACATACAAAATAATACCTCAGTGTAACCTCTGTTTGTATTTCCCTTGATTAACTGATGCTGAGCACATCTTCATGTGCTTATTGACCATTAATTAGTCTTATTTGTTAAATGTCTCAAATATTTTATACAGTTTTACATTGTGTTATTCATTTTTTAAAAAATTCATTTTAGGTTATATGTATGTGTGTGTCAAAGTGTGTGTACATCTATTTGATATATGTATGTCTATATATTCTGGATACCATCTCTGTTTCATGCATTGCATATATATTTGCCTATTTAGTGGTTTATCTTTTCATTTTCTTTTGGTATCTTTTCATTAGAAATGTTATTTATTTTGAGTAAGTAACATTTAATATATTCTGTAACATTTAATGAATCATTTTATGTTATGTTTAGTATTAAATTTCTGAAAACATTCTATGTATTCTACTAGAATTGTCATAATTTTATCTTTTATATACATTGATATTTTTATGTCAAATATGTAGGTATGTGATATTATGCACATGGTTTTAATTCAGTTAATTGTTCTTCCAGATGTTTGTACCATTCCAACATCATTTAAATCATTAAATGAAAAGCCTTTCCTTACTAGCTAGCCAGCTTTGAAAATCCATTCATAGGGTTTGTGTTAATATATTTTTGTTCTTTTTTTTCCTTTCTACTGATCTCTTTATATTAATACCTACTGTGGCTTTATATGAAGTCATGGAATAATACGTAGTAAGCCCTCTAACACTGTTCTGTTACTGTTGTTATTGTTTTCTCAGGGTACTTTGAAATATTCGAGATTTTATTATTTTTTAGTAGCCTAGATTTCAAGATTGTTTTGACGATCAATTTTTGAATCAATTGTCAATATTTTTAGTAATAAAATGATGATTTTTGATTGGAAATACATTAAATCTATAAGCCAAATTGGAGATTATTGATATATTAACAAAAATGAGTTTTCCAGTCCATGAATGTATGCACATTATAAAATTCATTCTTAAGTATGTCATTTTTTAAGTTTTAGTTTCAGCAGTATATGTTTGTTACATAGGTAAACTCCTGTCATGGGGGTTAGTTGTACAGGTTATTTTATCATCCAGGCATAAAGCCCAGTACCCAGTAGTTATCTTTTCTGCTCCTCTCCCTCCTGTCACCCTCCACTCTCAAGTAGACCCCAGTTTCTGTTGTTCTCTTCTTTGCATTAATGACTTCTCATCATTTAGATTGCACTTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTTAGTTTGCTAAGGATAACCACCTCCATCTCCATCCATGTTCCCACAAAAGACATGATCTCCTTTTTTATGGCTGCATATTATTCCATGGTATATATGTACCACATTTTCTTTATCCAATCTGTCATTGATGGACATTTAGGTTGTTTCCACATCATTGCCGTTGTAAATACTGCTGCAGTGAATATTCGTGTGTATGTCTTTATGGTAGAATGATTTATATTCCTCTGGGTATATTTCCAAGTAATGGGATGGTTGGGTCAAATGGTAATTCTGCTTTTAGCTTTTTGAGGAATTGCCATATTGCCTTTCACAACGGTTGAACTAATTTATACTCCCAAGAGTGTATAAGTTGTTCCTTTTTCTCTGCAACCTCGACATCACCTGTTATTTATGACTTTTATATAATAGCCATTCTGCTGGTCTGAGATGGTATCTCATTATGATTTTGATTTGCATTTCTCTAATGCTCAGTGATATTGAGCTTGGCTGCATATATGTCTTCTTTTAAAAATATCTGTTCATGTCCTTTGCCTAATTTATAACGGGGTTGTTTGTTTTTCTCTTGTAAATTTGTTTAAGTTCCTTATAGATTCTAGGTATTAAACCTTTTTTCAGAGGCGTGGCTTGCAAATATTTTCTCCCATTCTATAGGTTGTCTGTTTATTCTGTTGATAGTTTCCCTTGCTGTGCAGAAGCTCTTAACTTTAATTAGATCCGACTTGTCAATTTTTGCTTTGGTCGCAATTGCTTTTGATGTTATTGTCGTGAAATCTTTGCTAGTTCTTAGGTCCAGGATGATATTGCCCAAGTTGTCTTCCAGGGCTTTTATAATTTTGGATTTTACATTTAAGTCTTAATATATTTATTAAATTTGTTAGGGTTTCAGGATACAAGGACAATATAGCAGCAAACAATGTAAAAGTAAAATCTGAAAAATAATAGAAAACAGTTTAATTGAACACTTTACCATTATGTAATGCCCTTCTTTGTCTTTCCTGATCTTTGTTGGTTTGAAGTTCAAAAAAGACAAACTTAATGGTACAATAGGTATTGTAGATTTCAGGACTTTCTGTATAAAATATTTTGTATATATGAATAGATCATTTTTTATTTCCAGTCTTTAAACATTTTCTTAACATTTTCTTCTATTGCTTCACTTCACTCGCTAGGACCATCAGGACAGTGTTGAACAGAAATTGTCAGACTGATCATCACAACTTTTTCTAGATTTTAGAAGGAAATTTTTCTTTATTTCAACATAAAGCAGCATGTTAATGCCAAGTTTTAATATGTGTTATCAGATTGAAATTTTTTTGTATATTTCTACATTACCAAGAATTTTTAGCAAGAGTTTTTGTTGAGTTTTAATTTAAAAATCATTTGTTAATTTCATCTGATTTTTTTATTTCTCTTTTTACCTTAAGAGATTAAACTGACTACAGATTGAATATAAACAAACAAACAAACAAACAAAAACTCTAAAATGCTGTGGATCAACACCACTTAGTAATTTGTATACTTGGATTCAATTTGCTGAAATTTTGTTAGACATTTTTGCGTCGATATTTATGAGGGATGTTGATCTGTAAAAGTATTAAAATGCCTTTGACAGATTTTGATAGCAGTGTTATTCTGGCCTAATAAATCAAACTGAGGTATGATCCTTCCTTTTCTATTTCTTAATAGCATTTTTAAAATTGGTGGTTTTTTCCTTCCTTAGTGAAATTTACCAGCAAAGTAACAGGCCTTATATTTCTCTTGTGGAAATATTTTAATTTCAAATTAATGGTATTTTGTTCTTGTAGGGTGGTAATTTTCTCTGTGTTTGGTCTTAATGGACTCTTAGCTGATCACCCAGTTACTCAGCGAGGTCTCTTCACTCTGGAAGAGCTGGAACTCCAGTGTGTTTTAGTGCAGCATGACCACGGGTATTACCGTTCAACATTTAGGCTTTATCAGTGATAACTATTTGTCCTCATGGAGTTTTTGCCGCTGGGCCTACACAGTTTAGGCTTCAGCTTAGAACACATAATGAATTCTTATGCAGATTTCTGCCCACCTTTGACCTTTCATGATTTCCTCTTCTTGGGTAAGCTGCCTTATTAATCTGATACACTTCAGCAGTCCAGAACTACACTCTTTCCCTTCTCTGCTCTTGGAGATGACTCTTTTGTCTGAGATTCACTTTGCTGTGCTGAAAAAGAAAAGTGCTTCAAGGAAGATACCAAGGAAAATCACAGGGCTCATTTATGTATTTCTCTTCTTTCAAGGACTACAGCTTTGTGTTGCCTATGTTCAATTTCTGAAAATAATTAGAGCATATATACTCTGTGTGAGAAGGCAAATCCAGACAGTTAGTTTGTATGACTAGAAGCAGAAGTCTACATGGAGAATTTTACTTAACTGTGTTATAGTTTCTTTAATTATTTCAAGAGTATGTTTAATGTTCCACAGATCTCATTCTATAAATCTTTATCATCTTAGAGCTCTGATACTATTTAGAATTACTATTCCTTCAAATAAGAGATTAGAAACAGGGTTATATTTGGGGTAGGTTGACTTACTTTTCTGGGAACCAAAGCATATTAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATTTACATTCTGCAAAATCATTCTTTCCTTTTTGAATTTGAAAAGGATCTTTGGTATACAGATATTCAATAGCCAGCCTGAAGATTCATTTGAATTCATTTAATGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAAGTATCTATAACGATGGAAATATACATCTCCACTGCCCAAGATGGTAGTCATGAGTCAATATTGATCATGTGAGACGTGGCAAGTGTTACTCAGGGTCTCAATATTTAAATGTATTAAGCTTTAATTAATGTAAATTTGAATTTAGCAAAACATGTATAGCTTGTGGTTACTGTTTTATTCAGTGCCAATATAGAACATTTCCATGATTACAGAAAGTTATCTTAGAATACTCAGTTCTGGACTATTTTATCTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTGTTGAATATTTTTATAGCAAAAGTCATTTATAAATTTAAAACTCAAATAATTATCTTTTTCAATATGTAAAATATGTCTTTACATATTCTACTCCCTTCTTACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGAAATGACAACATAAATAATTTTACTAAGTGTCACCATATAAAAAACTTTGAACAAAATCAGATTATATCACTGTGGATATTTCTATTTTGAACTAACTTAGATGATAATTTTAATCTATATCCTAGATGAACTTTAAATCAATAAAATCTCTCAATGGTGTTATAAATCTCAAGCCATTAGCCACTGATTATCCCATTTTTATTCTTTTCATATTAATTTTATTGCCATGTATGAATGCTGTAGCATCCATGTTTAAATACTAGTTAACAAAATGCACTGGCATCAGATACAATAAGGATGAAATGAGATATAATTAGGACTCTGGTAACACACATAAAATTGGAAAGATACCCTGAAATTCAAGCCAAGAAGATATTTATCCAGCTTATTTTATTTTGAGACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGGCTCGCTCCAACCTCTGTCTCCCAAATTGAAGTAATTCTCGTGCCTCAATCTCCCGAGTAGCTGGGATTACAGGCATGTGTCACCAAGCCTGGCTGATTTTTGTAGTTTTAGTAGAGACGGGGTTTCACCATGATGGCCAGGCTGGTCTTGAACTCCTGGCCTCAAGTGACTGGAACACCTCGGCCTCCTAAAGTGCTGGGATTACAGACGAGAGCCACTGAACAGCTTTGATCCAACTTATTTGGATGAATGAGTTACATATTTTACATTAAATCTGTTATTGTGATAATTCTTCATGTTATTTTCCATGTATAGATTTATATATAATGTAATTTTAATTTTTTTTCACCGGAGAGTATAAACAACAATTATTTTATAAACAGGATAATAAAAATAAGACAAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGTTACTTTGAAGCTGTTATCAATACTTGTGATGTATTACAATTAAGTAAAGATTTAAAGATGCCATTTTTAACTTATTATGACACAAAGTCTATAAATTCTTATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATAAAATTTCTTCTATGGATTGGTCTTCAATCGAGGCATAAAAAGGAATATAACAGTGTGGCACTATAACTTCTATATTGAATTTCTATATTATTTAACACAATTATAATTTTGCTAATGAATTGTAATGTTTTTAAAAAGCTAGGTGAATTTTATTAAATTCATTACATGGCGATAACACAGAGAAAACATTTTGGGGATTCTTTTAAAATGGTATGTACAAAAGCTTAAAAGTTGTTATGTAGTGGCAGAGATAAAAAAGTAAAACAAAAAAAAGCTTAAAAGTTTGCTTTACTATTTATAGGCTCATAAGTGTAAGTGTGCCAGAAAATGAAAAAGAAAGGAGAGAAATTATAAATAACTGTGTGGAAAACACAGATAAAGCATAAAGATAGAATATAAAGATAGAAGCATTTTAATATGAGGCAGTGATGGCTTTTTGAAGAATCCCAACTAAGGACCTACTTTTAGTTAATAAATAATATGTTTCTAATCCCTATATTGTCCACAGCAACCTTTTTAGGACATGGAGCAGTGACTATGAGTGCCAGAAGGCAAGAGTAGAAGCAATTGTAAAATCATGAACACTAGTTTGTAAAATCCTCACTGAGATATAATATCTGTTTGCCTCTACCTTAGAATTATTAATGTCTTGAGGGCTGGGA
A very small piece of chromosome 21
What’s in a genome?
Genes (i. e., protein coding)
But. . . only <2% of the human genome encodes proteins
Other than protein coding genes, what is there?
• genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.)
• structural sequences (scaffold attachment regions)
Regulatory sequences
• “junk” (including transposons, retroviral insertions, etc.)
Genome overview
Marked variation in distribution of number of features (GC, CpG, repetitions) 20.000-25.000 protein coding genes Proteome is more complex than those of invertebrates Hundreds of genes resulted from horizontal transfer More than 1.4 million SNPs, 10 million SNPs
Application to Medicine and Biology
Disease genes – positional cloning (30 genes already) Paralogues of disease genes (achromatopsia, CNGA3, CNGB3); (971 known disease genes => 286 paralogues) Drug targets – recent compendium = 483 drug targets, 18 new identified; Alzheimer’s disease, -amyloid is generated by processing APP by BACE; BACE2 in obligatory Down’s syndrom region of chromosome 21 Basic biology – bitter taste - new family of G-protein coupled receptors
The next steps
Large scale identification of regulatory regionsSequencing of additional large genomesCompleting the catalogue of human variationSequence-based functional prediction
Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome.
Nature 2005 Sep 1;437(7055):69-87.
Thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements. 98,6 % identitity to human genome sequenceDifferences in gene/exon structures
Apparent differences between humans and great apes in the incidence or severity of medically important conditions (excluding differences explained by obvious anatomical differences).
Medical Condition Humans Great Apes
Definite
HIV progression to AIDS Common Very rare
Influenza A symptomatology Moderate to severe Mild
Hepatitis B/C late complications Moderate to severe Mild
P. falciparum malaria Susceptible Resistant
Menopause Universal Rare
Likely
E. coli K99 gastroenteritis Resistant Sensitive?
Alzheimer’s disease pathology Complete Incomplete
Coronary atherosclerosis Common Uncommon
Epithelial cancers Common Rare
top related