towards the arabidopsis haplotype map using arrays justin borevitz salk institute...

35
Towards the Arabidopsis Haplotype Map using Arrays Justin Borevitz Salk Institute naturalvariation.org

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Towards theArabidopsis Haplotype Mapusing Arrays

Justin BorevitzSalk Institutenaturalvariation.org

Talk Outline

• Single Feature Polymorphisms (SFPs)– Potential deletions

• Bulk Segregant Mapping– Extreme Array Mapping

• Haplotype analysis

• Expression Analysis

• New Arrays

What is Array Genotyping?

• Affymetrix expression GeneChips contain 202,806 unique 25bp oligo nucleotides.

• 11 features per probset for 21546 genes• New array’s have even more• Genomic DNA is randomly labeled with

biotin, product ~50bp.• 3 independent biological replicates

compared to the reference strain Col

GeneChip

Potential Deletions

Spatial Correction

Spatial Artifacts

Improved reproducibilityNext: Quantile Normalization

False Discovery and Sensitivity

PM only

SAM threshold

5% FDR

GeneChip SFPs nonSFPs Cereon marker accuracy 3806 89118 100% Sequence 817 121 696 Sensitivity

Polymorphic 340 117 223 34% Non-polymorphic 477 4 473

False Discovery rate: 3% Test for independence of all factors: Chisq = 177.34, df = 1, p-value = 1.845e-40 SAM threshold 18% FDR

GeneChip SFPs nonSFPs Cereon marker accuracy 10627 82297 100% Sequence 817 223 594 Sensitivity

Polymorphic 340 195 145 57% Non-polymorphic 477 28 449

False Discovery rate: 13% Test for independence of all factors: Chisq = 265.13, df = 1, p-value = 1.309e-59

3/4 Cvi markers were also confirmed in PHYB

90% 80% 70%

41% 53% 85%

90% 80% 70%

67% 85% 100%

Cereonmay be asequencingError

TIGRmatch isa match

Chip genotyping of a Recombinant Inbred Line

29kb interval

Discovery 6 replicates X $500 12,000 SFPs = $0.25Typing 1 replicate X $500 12,000 SFPs = $0.041

Potential Deletions

>500 potential deletions45 confirmed by Ler sequence

23 (of 114) transposons

Disease Resistance(R) gene clusters

Single R gene deletions

Genes involved in Secondary metabolism

Unknown genes

Potential Deletions Suggest Candidate Genes

FLOWERING1 QTL

Chr1 (bp)

Flowering Time QTL caused by a natural deletion in MAF1

MAF1

MAF1 natural deletion

Fast Neutron deletions

FKF1 80kb deletion CHR1 cry2 10kb deletion CHR1

Het

Map bibb100 bibb mutant plants100 wt mutant plants

bibb mapping

ChipMapAS1

Bulk segregantMapping usingChip hybridization

bibb maps toChromosome2 near ASYMETRIC LEAVES1

BIBB = ASYMETRIC LEAVES1

Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain

bibb as1-101

MYB

bib-1W49*

as-101Q107*

as1bibb

AS1 (ASYMMETRIC LEAVES1) =MYB closely related toPHANTASTICA located at 64cM

stamenstayLerSarah LiljegrenMapping confirmed

ein6eendouble mutantRamlah NehringMapping confirmed

Array Haplotyping

• What about Diversity/selection across the genome?

• A genome wide estimate of population genetics parameters, θw, π, Tajima’D, ρ

• LD decay, Haplotype block size

• Deep population structure?

• Col, Lz, Ler, Bay, Shah, Cvi, Kas, C24,

Est, Kin, Mt, Nd, Sorbo, Van, Ws2

C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K M M M N N N S S S v v V WWW

Cc

cc

Cc

Cj

jj

jj

jL

LL

BB

BS

SC

CC

kk

cc

EE

EK

KM

MM

NN

NS

SS

vv

VW

WW

o o o o o o o w w w w w w e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s

oo

oo

oo

ow

ww

ww

we

ee

aa

ah

hv

vv

aa

22

ss

se

et

tt

dd

do

oo

aa

as

ss

l l l l l l l C C C L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - -

ll

ll

ll

lC

CC

LL

Lr

rr

yy

ya

ai

ii

ss

44

tt

tn

n0

00

--

-r

rr

nn

n-

--

Pairwise Correlation between and within replicates

Array Haplotyping

Inbred lines

Low effectiverecombinationdue to partialselfing

Extensive LDblocks

Col Ler Cvi Kas Bay Shah Lz Nd

Chr

omos

ome1

~50

0kb

(-4,-3.5] (-3,-2.5] (-2,-1.5] (-1,-0.5] (0,0.5] (1,1.5] (2,2.5] (3,3.5]

T statistic

fre

qu

en

cy

0

e+

00

4

e+

04

8

e+

04

Distribution of T-stats

null (permutation)actual

Not Col ColNA NA duplications

32,427Calls

208,729

12,250 SFPs

Accession FDR Sensitivity SNP Totalbay 0.0% 43% 51 563c24 0.2% 39% 64 580cvi 0.0% 38% 91 543est 0.0% 59% 39 548kas 1.9% 44% 66 577kendl 3.1% 33% 57 545ler 0.0% 49% 43 562lz 0.0% 53% 51 573mt 0.2% 61% 49 570nd 0.0% 47% 49 568shah 0.0% 24% 80 548sorbo 0.0% 45% 55 526van 0.2% 29% 92 571ws2 0.0% 49% 57 514

Sequence confirmation of SFPs

SFPs for reverse genetics

http://naturalvariation.org/sfp

14 Accessions 30,950 SFPs`

Chromosome Wide Diversity

Diversity 50kb windows

Tajima’s D like 50kb windows

differences may be due to expression or hybridization

PAG1 down regulated in Cvi

PLALE GREEN1 knock out has long hypocotyl in red light

RNA DNA

Universal Whole Genome Array

Transcriptome AtlasExpression levelsTissues specificity

Transcriptome AtlasExpression levelsTissues specificity

Gene DiscoveryGene model correctionNon-coding/ micro-RNAAntisense transcription

Gene DiscoveryGene model correctionNon-coding/ micro-RNAAntisense transcription

Alternative SplicingAlternative Splicing Comparative GenomeHybridization (CGH)

Insertion/Deletions

Comparative GenomeHybridization (CGH)

Insertion/Deletions

MethylationMethylation

ChromatinImmunoprecipitation

ChIP chip

ChromatinImmunoprecipitation

ChIP chip

Polymorphism SFPsDiscovery/Genotyping

Polymorphism SFPsDiscovery/Genotyping

~35 bp tile, non-repetitive regions, “good” binding oligos, evenly spaced

ChipViewer: Mapping of transcriptional units of ORFeome

From 2000v At1g09750 (MIPS) to the latest AGI At1g09750

2000 v Annotation (MIPS)

The latest AGI Annotation

SNP SFP MMMMM MSFP

SFP

MMMMM M

Chromosome (bp)

con

serv

atio

n

SNP

ORFa

start AAAAA

Tra

nsc

ripto

me

Atla

s

ORFb

deletion

Improved Genome Annotation

Review

• Single Feature Polymorphisms (SFPs) can be used to

• Potential deletions (candidate genes)

• Identify recombination breakpoints

• eXtreme Array Mapping

• Haplotyping

• Diversity/Selection

• Association Mapping

NaturalVariation.org

Syngenta

Hur-Song ChangTong Zhu

UC Davis

Julin Maloof

University of Guelph, Canada

Dave Wolyn

Sainsbury Laboratory

Jonathan Jones

NaturalVariation.orgSalk

Jon WernerTodd MocklerSarah LiljegrenRamlah NehringHuaming ChenJoanne ChoryDetlef WeigelJoseph Ecker

UC San Diego

Charles Berry

Scripps

Sam HazenElizabeth Winzeler

Salk

Jon WernerTodd MocklerSarah LiljegrenRamlah NehringHuaming ChenJoanne ChoryDetlef WeigelJoseph Ecker

UC San Diego

Charles Berry

Scripps

Sam HazenElizabeth Winzeler

Syngenta

Hur-Song ChangTong Zhu

UC Davis

Julin Maloof

University of Guelph, Canada

Dave Wolyn

Sainsbury Laboratory

Jonathan Jones