advancing personal genetics with second generation sequencing

29
28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management Thanks to: Advancing Personal Genetics with Second Generation Sequencing

Upload: marcia-hester

Post on 30-Dec-2015

35 views

Category:

Documents


2 download

DESCRIPTION

Advancing Personal Genetics with Second Generation Sequencing. 28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management. Thanks to:. Context: Personal Genomics Landscape direct-to-consumer -- hybrid -- research only. *. *. REVEAL. *. *. 23andme. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advancing Personal Genetics with  Second Generation Sequencing

28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management

Thanks to:

Advancing Personal Genetics with Second Generation Sequencing

Page 2: Advancing Personal Genetics with  Second Generation Sequencing

Context: Personal Genomics Landscapedirect-to-consumer -- hybrid -- research only

REVEAL

*

*

*

*

Page 3: Advancing Personal Genetics with  Second Generation Sequencing

23andme

Page 4: Advancing Personal Genetics with  Second Generation Sequencing

Over 600 alleles of BRCA1 (Myriad/DNAdirect sequencing not chips)

Page 5: Advancing Personal Genetics with  Second Generation Sequencing

PersonalGenomes.org Project Goals

1) Low cost: <$1K : 98% exome (or more)2) Active subject participation, informed redaction3) Avoid over-promising de-identification 4) Entrance exam to ensure highly informed consent5) Multiple samples to ensure consistent IDs 6) Open access (not just researcher subset) 7) Trait questionnaire, stem cell RNA, biome 8) Cells available for personal functional genomics9) Scaleable to 100,000 diverse research subjects

0431

1070

1660

1677

1687

1781

1833

1846

Coriell GM2

•Employers/Insurers > Non-Discrimination Act•Actionable alleles are rare > all at risk•Non-actionable alleles > activism

1731

Page 6: Advancing Personal Genetics with  Second Generation Sequencing

1E-4

1E-2

1E+0

1E+2

1E+4

1E+6

1E+8

1E+10

1E+12

1E+14

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020

Daltons synth

Bits/sec

Seq bp/$

3 Exponential technologies3 to 18 month doubling times

Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Kurzweil 2002; Moore 1965

urea B12

tRNA

telegraph

Computation &Communication

Analytic

tRNA

Synthetic chemistry

human

Gbp chips

Page 7: Advancing Personal Genetics with  Second Generation Sequencing

Chips vs. Gen-2 SequencingIllumina Affymetrix bead-array

Roche-454 Illumina

ABI-SOLiD

Harvard-DanaherPolonator-G007

Chips: 0.02% of the genome – assumes common DNA variants stay associated with deleterious variants over 50,000 years

Sequencing 98% genome accesses the deleterious variants directly

Helicos

Page 8: Advancing Personal Genetics with  Second Generation Sequencing

G

A

C

T

Multiplex Cyclic Sequencing by Synthesis Single instrument, multiple chemistries: polonies on slides or beads

Polymerase -or- LigaseShendure,

Porreca, et al. 2005 Science

Illumina, IBS

AB-SOLiD, CGI

Mitra, et al. 2003 Analyt.

Biochem.1999NAR

Page 9: Advancing Personal Genetics with  Second Generation Sequencing

36 to 64 flowcells (+ DNA barcodes)

2 to 4 billion beads

8.5 thicksequence image

Page 10: Advancing Personal Genetics with  Second Generation Sequencing

Open-source hardware, software, wetware: Polonator G.007 (12TB image > 120 Gbp /run)

Enzyme/oligo kitsPolymerase or Ligase

chemistries$150K including

computer & 1 yr service,software, support

Danaher Inc.

Page 11: Advancing Personal Genetics with  Second Generation Sequencing

Effect of improvements on cost

Improvement Factor Feature cost/run

Sequencing cost/run

Gb/run Reagent cost/Gb

Fold decrease

None 1 $1,677 $685 10 $292

Flowcell volume 5 $1,677 $137 10 $181 1.4

Useable yield 6 $1,677 $685 60 $39 6

Instrument speed 2 $3,354 $1,370 20 $236 1

Emulsion sorting 18 $93 $685 10 $78 3

Readlength 48bp 3.7 $1,677 $2,534 37 $114 2

ALL $186 $1014 444 $2.70 88

Polonator instrument 3 yr amortization: $150k / 300 runs = $500/run = $50/Gb $150k / 81 runs = $1850/run = $4.2/Gb

($10 vs. $2000 / Gb for other 2nd gen)

Page 12: Advancing Personal Genetics with  Second Generation Sequencing

Personal genome sequencing options/goals

Technology Genome Cost Raw bpAB3730 98% $30M 7x = 42 Gb (3.5x each)Knome 98% $350K 15x = 84 GbSNP-chip 0.02% $1K 2 MbpPGP coding 1% $90 30x = 1GbPGP RNA 99% $20 30x 20K*n = 60 Mb (n=100 cell types for RNA)-path/resistome - $20 rRNA + 20K genesVDJ-Immunome - $20 ?

Page 13: Advancing Personal Genetics with  Second Generation Sequencing

Selective genome sequencing

Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83.

Red=Synthetic; Yellow=genome/cDNA

How do we optimize >100K 100mers ?

8 ways to capture alleles from genomic or c-DNA

In vitro Paired-end-tags (PET)

Gap fill

Cleave& ligate

Zhang, Chou, Shendure, Li, Leproust, Dahl, Davis, Nilsson, Church

For rearrangements

2. 3.

4. Hybr-select-chip 5. Hybr-select-solution 6. fluidic PCR 7. Multiplex PCR

1.

Page 14: Advancing Personal Genetics with  Second Generation Sequencing

Circle Capture DNA from Chips

Page 15: Advancing Personal Genetics with  Second Generation Sequencing

Aug 2007 R= .53 Jan 2008 R=.986

Zhang, Li et al. unpublished

Gap fill

Circle-capture 1% genome

Page 16: Advancing Personal Genetics with  Second Generation Sequencing

Genome to Phenome: Population Variation

GA

TC

Zhang & Church unpublished

cis

Trans

Geneproducts

GeneExpression

Genome

Environment

Traits

Page 17: Advancing Personal Genetics with  Second Generation Sequencing

G

A

TC

Allele-specific expression (ASE)

Combine all cis element variants

GA

AAAAAAAAAAAAAAAAAAAA

TC

TT

Enhancer, promoter, splicing, polyA, termination, transport, decay.Eliminate environmental & trans-acting variation among individuals.

G

A

GG

Allele-specific transcription factor

binding

TF

ChIP-Seq

Digital RNA allelotyping

Zhang, LI, Church unpublishedForton et al. Genome Res. 2007

Page 18: Advancing Personal Genetics with  Second Generation Sequencing

Genomic DNA

Lymphocyte

cDNA

Lymphocyte

cDNA

Fibroblast

cDNA

Keratinocyte

rs1264899, ATP5F1, ATP synthase

T/C = 0.51 T/C = 3.47T/C = 3.73

Tissue specific & allele specific gene expression confirmatory assays

Kun Zhang & Alice Li

Page 19: Advancing Personal Genetics with  Second Generation Sequencing

25X probe * 72X time =1800X Better efficiency.

Kun Zhang & Billy Li

Genomic DNA Aug 2007 Genomic DNA Jan 2008

cDNA Jan 2008

Page 20: Advancing Personal Genetics with  Second Generation Sequencing

Challenge: Multiple cell types from healthy adults

3mm skin sample

Page 21: Advancing Personal Genetics with  Second Generation Sequencing

PGPPhysiciansNetwork

Volunteers Induction of Multiple Gene Sets(not necessarily functional tissues)

Primary fibroblasts

Complex Traits via Allele-Specific Gene Expression

Induced Stem Cells

mRNA

MultiplexedDifferentiati

on

MultiplexedReprogramming

Sequence tag

quantitation

Jay Lee et al. unpublished

Page 22: Advancing Personal Genetics with  Second Generation Sequencing

Induced Pluripotent Stem Cell Generation & Transdifferentiation (Oct4/Sox2/Myc/Klf4)

Retroviral Infection

Tissue Culture on a Mouse Feeder Layer

ES Cell Colony Identification

Clonal Isolation and Propagation

Embryoid Body Induction&

Guided Differentiation

Adenoviral Infection

Mixture of differentiated cell types

&Guided Differentiation

2 monthsMultiple integration sites

1 weekNo genomic integration

Yamanaka, Daley, ThomsonHochedlinger, Jaenisch labs

Lee & Church

Page 23: Advancing Personal Genetics with  Second Generation Sequencing

Multiple cell-types with transdifferentiation

Retroviral InfectionAdenoviral Infection

MyoD

CD34

Collagen

Page 24: Advancing Personal Genetics with  Second Generation Sequencing

Kun Zhang & Fan Liang

Green: phase contrast imageRed: Cy5-labelled Alu probe

Nunc or UCSD

Haplotyping by amplification of single chromosomes or fragments

Page 25: Advancing Personal Genetics with  Second Generation Sequencing

• Ultra-clean conditions for reduction of background amplification + Real-Time monitoring

• Post-amplification chip hybridization distinguishes alleles

• Amplification variation random & easily filled by PCR

• error rate <1.7 10–5

Single-cell or Single DNA-fragment (haplotype) sequencing: 5 Mbp

Zhang et al. Nature Biotec 2006

Page 26: Advancing Personal Genetics with  Second Generation Sequencing

Environments of Genomes

VDJ-ome

TRAITS

biome

RNAomePERSONAL GENOME

One in a life-time genome + yearly ( to daily) tests

Bio-weather map : Allergens, Microbes, Viruses

Page 27: Advancing Personal Genetics with  Second Generation Sequencing

PGP Resistome: 18 Antibiotics

Dantas, Sommer, Churchunpublished

Page 28: Advancing Personal Genetics with  Second Generation Sequencing

Bacteria Subsisting on 18 Antibiotics

DantasSommerChurchScience

2008

Page 29: Advancing Personal Genetics with  Second Generation Sequencing

Personal genome sequencing options/goals

Technology Genome Cost Raw bpAB3730 98% $30M 7x = 42 Gb (3.5x each)Knome 98% $350K 15x = 84 GbSNP-chip 0.02% $1K 2 MbpPGP coding 1% $90 30x = 1GbPGP RNA 99% $20 30x 20K*n = 60 Mb (n=100 cell types for RNA)-path/resistome - $20 rRNA + 20K genesVDJ-Immunome - $20 ?