human genome sequence and variability

49
Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College [email protected] Medical Genomics Course – Debrecen, Hungary, May 2006

Upload: enye

Post on 05-Jan-2016

27 views

Category:

Documents


2 download

DESCRIPTION

Human Genome Sequence and Variability. Gabor T. Marth, D.Sc. Department of Biology, Boston College [email protected]. Medical Genomics Course – Debrecen, Hungary, May 2006. Lecture overview. 1. Genome sequencing strategies, sequencing informatics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Human Genome Sequence and Variability

Human Genome Sequence

and Variability

Gabor T. Marth, D.Sc.

Department of Biology, Boston [email protected]

Medical Genomics Course – Debrecen, Hungary, May 2006

Page 2: Human Genome Sequence and Variability

Lecture overview

1. Genome sequencing strategies, sequencing informatics

2. Genome annotation, functional and structural features in the human genome

3. Genome variability, DNA nucleotide, structural, and epigenetic variations

Page 3: Human Genome Sequence and Variability

1. The Human genome sequence

Page 4: Human Genome Sequence and Variability

The nuclear genome (chromosomes)

Page 5: Human Genome Sequence and Variability

The genome sequence

• the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)

Page 6: Human Genome Sequence and Variability

Completed genomes

~1 Mb~100 Mb

>100 Mb

~3,000 Mb

Page 7: Human Genome Sequence and Variability

Main genome sequencing strategies

Clone-based shotgun sequencing

Whole-genome shotgun sequencing

Human Genome Project Celera Genomics, Inc.

Page 8: Human Genome Sequence and Variability

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 9: Human Genome Sequence and Variability

Clone mapping – “sequence ready” map

Page 10: Human Genome Sequence and Variability

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 11: Human Genome Sequence and Variability

Shotgun subclone library construction

BAC primary clone cloning vector

sequencing vector

subclone insert

Page 12: Human Genome Sequence and Variability

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 13: Human Genome Sequence and Variability

Sequencing

Page 14: Human Genome Sequence and Variability

Robotic automation

Lander et al. Nature 2001

Page 15: Human Genome Sequence and Variability

Base calling

PHREDbase = AQ = 40

Page 16: Human Genome Sequence and Variability

Vector clipping

Page 17: Human Genome Sequence and Variability

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 18: Human Genome Sequence and Variability

Sequence assembly

PHRAP

Page 19: Human Genome Sequence and Variability

Repetitive DNA may confuse assembly

Page 20: Human Genome Sequence and Variability

Sequence completion (finishing)

CONSED, AUTOFINIS

H

gapregion of low sequence coverage and/or quality

Page 21: Human Genome Sequence and Variability

2. Human genome annotation

Page 22: Human Genome Sequence and Variability

Genome annotation – Goals

protein coding genes RNA genesrepetitive elements

GC content

Page 23: Human Genome Sequence and Variability

The starting material

AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAGTCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

Page 24: Human Genome Sequence and Variability

Coding genes – ab initio predictions

ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA

Open Reading Frame = ORF

Stop codonStart codon

PolyA signal

Page 25: Human Genome Sequence and Variability

Ab initio predictions

Gene structure

Page 26: Human Genome Sequence and Variability

Ab initio predictions

…AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG…

splice donor site splice acceptor site

Page 27: Human Genome Sequence and Variability

Ab initio predictions

GenscanGrailGenieGeneFinderGlimmeretc…

EST_genomeSim4SpideyEXALIN

Page 28: Human Genome Sequence and Variability

Homology based predictions

ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA

ACGGAAGTCT

known coding sequence from another organism

GGACTATAAA

expressed sequence

genes predicted by homology

GenomescanTwinscanetc…

Page 29: Human Genome Sequence and Variability

Consolidation – gene prediction systems

Otto

Ensembl

FgenesH

Genscan

Grail

Genewise

Sim4 dbEst

Page 30: Human Genome Sequence and Variability

ncRNA genes

prediction based on structure (e.g. tRNAs)

for other novel ncRNAs, only homology-based predictions have been successful

Page 31: Human Genome Sequence and Variability

Repeat annotations

Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library

Page 32: Human Genome Sequence and Variability

The landscape of the human genome

Page 33: Human Genome Sequence and Variability

Gene annotations – # of coding genes

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 34: Human Genome Sequence and Variability

Gene annotations – gene length

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 35: Human Genome Sequence and Variability

Gene annotations – gene function

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 36: Human Genome Sequence and Variability

GC content and coding potential

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 37: Human Genome Sequence and Variability

ncRNAs

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 38: Human Genome Sequence and Variability

Segmental duplications

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 39: Human Genome Sequence and Variability

Repeat elements

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Page 40: Human Genome Sequence and Variability

Genes and repeats

Page 41: Human Genome Sequence and Variability

Physical vs. genetic map (Mb/cM)

0.4 cM 1.3 cM 0.7 cM

0.4 Mb 0.7 Mb 0.3 Mb

Page 42: Human Genome Sequence and Variability

3. Human genome variability

Page 43: Human Genome Sequence and Variability

DNA sequence variations

• the reference Human genome sequence is 99.9% common to each human being

• sequence variations make our genetic makeup unique

SNP

• the most abundant human variations are single-nucleotide polymorphisms (SNPs) – 10 million SNPs are currently known

Page 44: Human Genome Sequence and Variability

DNA sequence variations

insertion-deletion (INDEL) polymorphisms

Page 45: Human Genome Sequence and Variability

Structural variations

Speicher & Carter, NRG 2005

Page 46: Human Genome Sequence and Variability

Structural variations

Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767

Page 47: Human Genome Sequence and Variability

Detection of structural variants

Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767

Page 48: Human Genome Sequence and Variability

Epigenetic changes: chromatin structure

Sproul, NRG 2005

Page 49: Human Genome Sequence and Variability

Epigenetic changes: DNA methylation

Laird, NRC 2003