the ashkenazi genome project

18
The Ashkenazi Genome Project Shai Carmi Pe’er lab, Columbia University and The Ashkenazi Genome Consortium (TAGC) Personal Genomes & Medical Genomics Cold Spring Harbor, NY November 2012

Upload: jesus

Post on 24-Feb-2016

62 views

Category:

Documents


2 download

DESCRIPTION

The Ashkenazi Genome Project. Shai Carmi Pe’er lab, Columbia University and The Ashkenazi Genome Consortium (TAGC). Personal Genomes & Medical Genomics Cold Spring Harbor, NY November 2012. Recent History of Ashkenazi Jews. Mediterranean origin (?) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Ashkenazi Genome Project

The Ashkenazi Genome Project

Shai CarmiPe’er lab, Columbia University

andThe Ashkenazi Genome Consortium (TAGC)

Personal Genomes & Medical GenomicsCold Spring Harbor, NY

November 2012

Page 2: The Ashkenazi Genome Project

Recent History of Ashkenazi Jews

• Mediterranean origin (?)• Ca. 1000: Small communities

in N. France, Rhineland

• Migration east

• Expansion

• ~10M today, mostly

in US and Israel

• Relative isolation

Page 3: The Ashkenazi Genome Project

Ashkenazi Jewish Genetics

Behar et al., Nature 2010.Bray et al., PNAS 2010.Guha et al., Genome Biology 2012.

300 Jewish individuals; SNP arrays

• Recently, AJ shown to be a genetically distinct group• Close to Middle-Eastern & South-European populations

Price et al., PLoS Genetics 2008.Olshen et al., BMC Genetics 2008.Need et al., Genome Biology 2009.Kopelman et al., BMC Genetics, 2009.

AJ

Atzmon et al., AJHG 2010

Jewish non-AJ

Middle-Eastern

Europeans

Page 4: The Ashkenazi Genome Project

Recent Demography & IBDIn small populations, common ancestors are likely recent.

A B

Page 5: The Ashkenazi Genome Project

Recent Demography & IBDIn small populations, common ancestors are likely recent.

A B

AB

A shared segment

• IBD is highly informative on recent history!

• IBD common in AJ.(Gusev et al., MBE 2011)

Many long haplotypes identical-by-descent

Page 6: The Ashkenazi Genome Project

AJ Genetic History

Expansion rate ≈34% per generation

2,300

N

t

Effective size

45,000270

4,300,000

Years ago

800

Present

Palamara et al., AJHG 2012

High potential for genetic studies!

0%

20%

40%

60%

80%

100%

0 50 100 150 200 250 300 350 400 450 500

# of Sequenced Individuals

% A

dditi

onal

Info

rmati

on P

oten

tial

WTCCC AJ_SCZ AJUK

Pow

er o

f im

puta

tion

by IB

D

Page 7: The Ashkenazi Genome Project

The Ashkenazi Genome Consortium

Phase I:• 58 AJ personal genomes (86 under way)• ~60yo, healthy controls• Unrelated, PCA-validated AJ• Technology: Complete Genomics

Goal:• Sequence to high coverage hundreds of healthy AJ

o Use as a reference panel for association studies, imputation, and clinical interpretation

o Understand population history and functional genetic variation in AJ

Page 8: The Ashkenazi Genome Project

Quality ControlProperty Genome (exome)Coverage ~55x

Fraction called 96.5±0.003% (98%)Fraction with coverage > 20x 92.4±0.018% (94.9%)Concordance with SNP array 99.87±0.1%

Ti/Tv ratio 2.14±0.003 (3.05)

Ti/T

v

Page 9: The Ashkenazi Genome Project

Variant Statistics &Comparison to Europeans

TAGC

14 Flemish genomes (Belgium)

All SNPs 3000000

3200000

3400000

3600000

Het/hom1.4

1.6

In-ser-

tions

Deletions MNPs0

100000200000

(M)

(k) Similar results in 13 CG European public genomes.

Page 10: The Ashkenazi Genome Project

Comparison to Europeans• Allele frequency spectrum:– No excess singletons.– Slight excess of doubletons.

• More novel SNPs in AJ (3.8% vs. 3.1%).

singletons

doubletons

Page 11: The Ashkenazi Genome Project

Quality Control (2)False positive rate assessment by runs of homozygosity:• Assume hets in high confidence roh are FP.

• Genome wide extrapolation: ~20,000 per genome.• QC: – Discard putatively low-quality variants– Discard HWE violations, low call rate

FP after QC: ~5,000 per genome.

hets

PaternalMaternal

Page 12: The Ashkenazi Genome Project

Applicability to Clinical Genomics

• Variants of unknown significance– Technical false positives– True variants without health impact

All After QC Not in panel

020000400006000080000

100000120000140000

Total

All After QC Not in panel

0

100

200

300

400

500

600

Non-synonymous

Nov

el v

aria

nts p

er sa

mpl

e

Not in TAGC

Not in TAGC

Page 13: The Ashkenazi Genome Project

Demographic Inference• Use allele frequency spectrum and coalescent simulations.• Assume the demographic model previously mentioned.

• Parameters qualitatively similar to those inferred from IBD• Bottleneck 35gbp of size 500; Pre-bottleneck size 90,000

100

10

1

0.1

%sit

es

Page 14: The Ashkenazi Genome Project

Summary• IBD reveals AJ population bottleneck and expansion and

potential for genetics studies.• High quality genomes sequenced by TAGC indicate

utility in clinical setting.• Confirm demography and demonstrate subtle

differences from Europeans.

• Ongoing analysis:– Imputation power using TAGC vs. 1kG as ref panels– Local ancestry inference– Functional variants; AJ disease genes– Mobile element insertions

Page 15: The Ashkenazi Genome Project

Thank you!TAGC consortium members:Columbia University Computer Science:Itsik Pe’er, Pier Francesco PalamaraUndergrads: Fillan Grady, Ethan Kochav, James XueIT: Shlomo HershkopLong-Island Jewish Medical Center:Todd Lencz, Semanti Mukherjee, Saurav GuhaColumbia University Medical Center:Lorraine Clark, Xinmin LiuAlbert Einstein College of Medicine:Gil Atzmon, Harry OstrerMount Sinai School of Medicine:Inga Peter, Laurie OzeliusMemorial Sloan Kettering Cancer Center:Ken Offit, Vijai JosephYale School of Medicine:Judy Cho, Ken Hui, Monica BowenThe Hebrew University of Jerusalem:Ariel Darvasi

Funding:Human Frontiers Science program.

VIB, Gent, BelgiumHerwig Van Marck, Stephane PlaisanceComplete GenomicsJason Laramie

Page 16: The Ashkenazi Genome Project

Formal Inference Using IBD• Assume a population of historical size . • Total shared segments of length :

A B

AB

A shared segment

Palamara et al., AJHG 2012

• Detect IBD in sample Infer history .

Page 17: The Ashkenazi Genome Project

Data processing• CGA tools VCF generator: called sites only.• Correct multi-nucleotide substitution bug.• Compress, index, and distribute.• Generate high-quality genotypes set for population genetic analyses.

– Remove indels and multi-nucleotide substitutions.– Remove low-quality SNPs.– Remove multi-alleic SNPs.– Remove half-calls.– Remove SNPs with high no-call rate.– Remove SNPs not in Hardy-Weinberg equilibrium.– Remove monomorphic reference SNPs.– Remove an inbred individual.– Format as Plink file.

Page 18: The Ashkenazi Genome Project

Variant statisticsStatistic Per genome (exome)

Total SNPs 3.4M (22k)

Novel SNPs 3.7% (4%)

Het/hom ratio 1.64 (1.67)

Insertions count 223k (246)

Deletions count 237k (218)

Substitutions count 83k (374)

Synonymous SNPs 10525

Non-synonymous SNPs 9695

Nonsense SNPs 71

Other disrupting 241

CNV count 336

SV count 1486

MEI count 3475