the ultimate genotyping experiment: … the ultimate genotyping experiment: determination of human...

42
1 The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent Drug Dependence University of Colorado

Upload: vuthu

Post on 17-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

1

The Ultimate Genotyping Experiment:

Determination of Human DNA Sequences

Dept. of MCD Biology

Institute for Behavioral Genetics

Center for Adolescent Drug Dependence

University of Colorado

Page 2: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

2

Overview

Why DNA sequencing will become the tool of

choice to study genotype/phenotype

relationships for heritable traits

What is the current technology that makes it work

How to make whole genome sequencing affordable

for large-scale genetic studies

Page 3: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

3

State of the art: Association studies via

Genome Wide Association

Strategy: survey 105 - 106 well-known single

nucleotide polymorphisms (SNPS) in large

populations - score for co-variation with trait

– “Skims” genetic variation and can allow

correlation with trait of interest

– Only 5-10% of the ~10 million common SNPs

– Based on “common allele, common disease”

model

Page 4: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

4

When it is good, it is very, very good

• Hundreds of successful GWAS studies for

several phenotypes (i.e. diabetes,

hypertension, asthma, height, obesity)

• Depends alot on “power” which is

proportional to the number of people

studied

• Also depends critically on phenotype

Page 5: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

5

Example: Blood lipids

• In an analysis by a group lead by Gonzalo

Abecasis U. of Michigan

– combined 41 samples, >100,000 genotypes

– Phenotype: Fasting lipids

(LDL,HDL,Triglycer.)

– No medicated people studied

– 2.5 x 106 SNP (typed+imputed); MAF >=1%

• Identified 95 loci that associate with levels

Page 6: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

6

How good is this? Very, very

good.

• OMIM had reported 18 genes affecting lipids

– 15 of them within 100kb of GWAS hit

– 8 within 10kb

• Computer simulations of alleles randomized

averaged <1 within 100kb and not 1/106

simulations had more than 8.

Page 7: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

7

Does it mean anything?

• One GWAS allele (40% frequency) was

found to be in a GALNT2 (glycosylation)

• Allele causes only +/- 1mg/dl HDL-C

Teslovich et al, in press

Page 8: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

8

Does it mean anything?

• One GWAS allele (40% MAF) was found

to be in a GALNT2 (glycosylation)

• Allele causes only +/- 1mg/dl HDL-C

• In mouse--

– Overexpression decreases HDL-C ~20%

– Knockdown increases HDL-C ~30%

• So clearly this gene, that had no known

role in lipid metabolism, CAN be important

Page 9: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

9

But when it is bad, it is horrid

Successful studies can account for only a

fraction of the genetic influence on

phenotypic variance for most behavioral

traits despite high heritability – why?

• Genes with high-influence may be lacking

• Phenotypes inappropriately defined

• Insufficient N

• Inability to study rare variants

meaningfully

Page 10: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

10

Sequencing can do genotyping

ALOT better

• Whole genome sequencing types ALL

polymorphisms - rare and common

• GWAS done with sequencing has no

“missing” data like chip-based methods

• Linkage (and LD) are not required for

association. More “straighforward”

analysis

• May eventually be cheaper per marker

Page 11: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

11

Sharpening tools

Efforts to begin large scale DNA sequencing

to increase power to detect genes are now

being piloted

1000 Genomes Project is pointing the way

– Moderate frequency alleles – low pass

sequencing (low cost/person)

– Rare variants – deep sequencing (high cost)

Page 12: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

12

Digression on Sequencing

• Rates of acquisition of DNA sequence

have gone through the roof

• Accuracy improves and costs have

plummeted

• Most of the progress due to determination

of reference human sequence combined

with technological advances in short read

technologies

Page 13: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

13

Sequencing is affordable!

13

Page 14: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

14

Sequencing is affordable!

14

Page 15: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

15

Sequencing is affordable!

15

Publications by Year using Illumina

Sequencing Methods

2007 2008 2009 2010

Page 16: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

16

Three prominent technolgies

• Illumina - based on solid phase PCR

approach (similar to SoLID system)

• 454 - similarly based on solid phase DNA

synthesis using highly processive process

• PacBio - based on true single-molecule

detection approach - MOST processive

16

Page 17: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

17

Illumina

17

Page 18: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

18

Illumina

18

Page 19: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

19

Illumina

19

Page 20: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

20

Roche - 454

Page 21: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

21

PacBio Single molecule of

DNA at a time

75,000 reads

simultaneously

1000-10000/read

Page 22: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

22

What is good for what?

22

454 SOLID ILLUMINA PACBIO

Method DNA Pol

synthesis

Ligase PCR with fluor.

dNTPs

DNA Pol

special NTPs

Medium Beads Beads Glass surface Optical well

Error

types

Indels at

homopol.

End errors End errors Random indels

Bases/re

ad

400-1000 50 100+100 >1000

Most

common

use

Metagenomics Resequencing

/de novo

Resequencing/

de novo

de novo

microbial

genomes

Page 23: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

23

What can these babies do?

• Illumina,

SoLID, 454

can deal

with most of

these

methods

23

Tag profiling

Small RNA Discovery

mRNA-Seq Methylation

Targeted

Resequencing

CNV

DNase I

Hypersensitivity

Metagenomics

ChIP-Seq

ChIA-PET

Bacterial Sequencing

Human Genome

Resequencing

Nucleosome Mapping Molecular

Cytogenetics

De novo

Sequencing

Page 24: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

24

What can these babies do?

• Illumina HiSeq 2000 - popular for genotyping

24

HiSeq 2000

Readlength 2X100

Yield / run 250 Gb

Runs / genome 1/2

Depth 54.9x

SNPs 4,232,886

550k GT coverage 99.8%

Genotype concordance 99.3%

Page 25: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

25

How are bases called given

finite errors? • One can sequence many times (i.e. 5x coverage

25

5’-ACTGGTCGATGCTAGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTGCTAGCTCGACG-3’

Reference Genome

GCTAGCTGATAGCTAGCTAGCTGATGAGCCCGA

AGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTG

ATGCTAGCTGATAGCTAGCTAGCTGATGAGCC

ATAGCTAGATAGCTGATGAGCCCGATCGCTGCTAGCTC

TAGCTGATAGCTAGATAGCTGATGAGCCCGAT

Sequence Reads

Predicted Genotype ?

Page 26: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

26

How are bases called given

finite errors?

26

5’-ACTGGTCGATGCTAGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTGCTAGCTCGACG-3’

Reference Genome

GCTAGCTGATAGCTAGCTAGCTGATGAGCCCGA

AGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTG

ATGCTAGCTGATAGCTAGCTAGCTGATGAGCC

ATAGCTAGATAGCTGATGAGCCCGATCGCTGCTAGCTC

TAGCTGATAGCTAGATAGCTGATGAGCCCGAT

Sequence Reads

P(reads|A/A , read mapped)= 0.00000098

P(reads|A/C , read mapped)= 0.03125

P(reads|C/C , read mapped)= 0.000097

Page 27: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

27

How are bases called

27

5’-ACTGGTCGATGCTAGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTGCTAGCTCGACG-3’ Reference Genome

GCTAGCTGATAGCTAGCTAGCTGATGAGCCCGA

AGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTG

ATGCTAGCTGATAGCTAGCTAGCTGATGAGCC

ATAGCTAGATAGCTGATGAGCCCGATCGCTGCTAGCTC

TAGCTGATAGCTAGATAGCTGATGAGCCCGAT

Sequence Reads

Individual Based Prior: Every site has 1/1000 probability of varying.

P(reads|A/A)= 0.00000098 Prior(A/A) = 0.00034 Posterior(A/A) = <.001

P(reads|A/C)= 0.03125 Prior(A/C) = 0.00066 Posterior(A/C) = 0.175

P(reads|C/C)= 0.000097 Prior(C/C) = 0.99900 Posterior(C/C) = 0.825

Page 28: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

28

How are bases called

28

• Individual Based Prior

• Assumes all sites have an equal probability of showing polymorphism

• Specifically, assumption is that about 1/1000 bases differ from reference

• If reads where error free and sampling Poisson …

• … 14x coverage would allow for 99.8% genotype accuracy

• … 30x coverage of the genome needed to allow for errors and clustering

Page 29: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

29

What if....

29

5’-ACTGGTCGATGCTAGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTGCTAGCTCGACG-3’ Reference Genome

GCTAGCTGATAGCTAGCTAGCTGATGAGCCCGA

AGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTG

ATGCTAGCTGATAGCTAGCTAGCTGATGAGCC

ATAGCTAGATAGCTGATGAGCCCGATCGCTGCTAGCTC

TAGCTGATAGCTAGATAGCTGATGAGCCCGAT

Sequence Reads

Population Based Prior: Use frequency information from examining others at the same site. In the example above, we estimated P(A) = 0.20

P(reads|A/A)= 0.00000098 Prior(A/A) = 0.04 Posterior(A/A) = <.001

P(reads|A/C)= 0.03125 Prior(A/C) = 0.32 Posterior(A/C) = 0.999

P(reads|C/C)= 0.000097 Prior(C/C) = 0.64 Posterior(C/C) = <.001

Page 30: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

30

What if....

30

5’-ACTGGTCGATGCTAGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTGCTAGCTCGACG-3’ Reference Genome

GCTAGCTGATAGCTAGCTAGCTGATGAGCCCGA

AGCTGATAGCTAGCTAGCTGATGAGCCCGATCGCTG

ATGCTAGCTGATAGCTAGCTAGCTGATGAGCC

ATAGCTAGATAGCTGATGAGCCCGATCGCTGCTAGCTC

TAGCTGATAGCTAGATAGCTGATGAGCCCGAT

Sequence Reads

Population Based Prior: Use frequency information from examining others at the same site. In the example above, we estimated P(A) = 0.20

P(reads|A/A)= 0.00000098 Prior(A/A) = 0.04 Posterior(A/A) = <.001

P(reads|A/C)= 0.03125 Prior(A/C) = 0.32 Posterior(A/C) = 0.999

P(reads|C/C)= 0.000097 Prior(C/C) = 0.64 Posterior(C/C) = <.001

Page 31: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

31

How are bases called

31

Population Based Prior •Uses frequency information obtained from examining other individuals

•Calling very rare polymorphisms still requires 20-30x coverage of the genome

•Calling common polymorphisms requires much less data

Haplotype Based Prior or Imputation Based Analysis •Compares individuals with similar flanking haplotypes

•Calling very rare polymorphisms still requires 20-30x coverage of the genome

•Can make accurate genotype calls with 2-4x coverage of the genome

•Accuracy improves as more individuals are sequenced

Page 32: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

32

What good is using statistics?

32

.5 – 1% 1 – 2% 2-5%

400 Deep Genomes (30x - current cost ~$4,000,000 - 2nd quarter ~$2,000,000)

Discovery Rate 100% 100% 100%

Het. Accuracy 100% 100% 100%

Effective N 400 400 400

3000 Shallow Genomes (4x - current cost ~$4,000,000 -> $2,000,000)

Discovery Rate 100% 100% 100%

Het. Accuracy 90.4% 97.3% 98.8%

Effective N 2406 2758 2873

This would cover essentially ALL 10x106 common SNPs

in 2800 individuals. Affy cost now - $1,000,000 for 1/10

the number of common SNPs.

Page 33: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

33

Genotyping by sequencing

• Even with current technology, sequencing

can be an alternative to chip genotyping

• Costs ~$5K per person for deep coverage

• Costs ~$800 per person for 4X coverage

• Using hapmap + knowledge about alleles,

can study all SNPs with MAF > 1-2%

33

Page 34: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

34

Does sequencing help?

• Too soon to be sure - but probably so

• Best work in 1000 genomes project

– 2 deeply sequenced trios

– 179 whole genomes sequenced at low coverage

– 8,820 exons deeply sequenced in 697 individuals

15M SNPs, 1M indels, 20,000 structural variants

34

Page 35: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

35

Some highlights

35

Highlights Reduced Diversity Extending ~120kb Around Genes

Page 36: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

36

Allele frequency spectrum

36

Page 37: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

37

Does sequencing improve association:

Expression QTL example TIMM22

37

Page 38: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

38

Does sequencing improve association:

Expression QTL example TIMM22

38

Page 39: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

39

Does sequencing improve association:

Expression QTL example TIMM22

39

Page 40: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

40

Imputation

• Given detailed allele distribution data, it is

possible to “guess” genotypes based on

hapmap/neighboring markers

• Improves with better understanding of

allele distribution

• Allows conversion of Affy/Illumina chip

data into more detailed SNP information

electronically

40

Page 41: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

41

Imputation

41

Reference Imputation Accuracy (r2)

Panel Release Date MAF 1-3% MAF 3-5% MAF >5%

1000G Pilot (final) June 2010 ~0.69 ~0.77 ~0.91

280 EUR (draft) November 2010 ~0.73 ~0.78 ~0.92

• As more samples are sequenced, ability to impute individual SNPs improves

• As more samples are sequenced, it becomes possible to impute additional markers

Page 42: The Ultimate Genotyping Experiment: … The Ultimate Genotyping Experiment: Determination of Human DNA Sequences Dept. of MCD Biology Institute for Behavioral Genetics Center for Adolescent

42

Status of 1000 Genomes

• 25,487,060 variant sites called on 629 samples – 7,922,125 sites in dbSNP 129 – 17,564,935 sites not in dbSNP 129 – 98.8% of HapMap III sites rediscovered – Transition/transversion ratio of 2.21 vs 2.04 in

Pilot

• As of November 2010: – 1103 sequenced samples – 22.6 Tb of raw sequence data

42