clustering of somatic mutations to characterize cancer heterogeneity with whole genome sequencing

1
CLUSTERING OF SOMATIC MUTATIONS TO CHARACTERIZE CANCER HETEROGENEITY WITH WHOLE GENOME SEQUENCING J Becq 1 , A Alexa 1 , K Cheetham 1 , R Grocock 1 , Z Kingsbury 1 , A Timbs 2 , D McBride 1 , S Humphray 1 , M Ross 1 , A Schuh 2 and D Bentley 1 1 illumina Cambridge Ltd., Chesterford Research Park, Cambridge, UK and 2 Oxford NIHR Biomedical Research Centre, University of Oxford, Oxford UK CANCER HETEROGENEITY CLINICAL STUDY DETECTION OF TUMOUR SPECIFIC MUTATIONS TIME-SERIES ANALYSIS OF SOMATIC SINGLE NUCLEOTIDE V ARIANTS Mutant AF: 4/7 = 0.57 Mutant AF: 8/9 = 0.89 REF Base = A Stage D Stage R1 A A A T T T T A T T T T T T T T 1. Select SNVs seen in at least one time-point 2. Filter for good quality (not in copy number aberration region, 15x < coverage < 200x in all but 1, genotype Qscore > 15 in all but 1) 3. Measure mutant Allele Frequency (mutant AF) at each time-point chr position D R1 P1 R2 preT3 chr1 154543705 0.0000 0.0000 0.2174 0.4255 0.4242 chr2 198266834 0.5000 0.6364 0.3478 0.4091 0.5938 chr3 31107645 0.0303 0.0000 0.1928 0.4146 0.4483 chr21 1592215 0.0000 0.0000 0.0476 0.0526 0.2500 chr22 32831696 0.4211 0.5294 0.0538 0.0000 0.0000 chrX 142716811 0.0000 0.0000 0.4737 0.9000 1.0000 MUTATION PROFILES OF ALL SOMATIC SNVS FROM CLUSTERS TO CLONES Founder mutations Mutations sensitive to treatment Mutations resistant to treatment Emerging mutations after treatment Cancers are genomically diverse and dynamic entities Clonal evolution generally selects for increased proliferation and survival, and might lead to invasion, metastasis and therapeutic resistance Black lines: somatic SNVs with non-synonymous or nonsense consequences V ALIDATION WITH DEEP SEQUENCING PROPOSED CELL POPULATION Target mutation of interest Ultra-deep sequencing (50,000x) of amplicons for all somatic SNVs with non-synonymous consequences Report accurate mutant allele frequency for each amplicon Amplify by PCR Sequence on MiSeq instrument Mutations present at all stages, regardless of treatments Mutations decreasing after Fludarabine + Chlorambucil + Rituximab treatment Emerging mutations (mutant AF is 0% at early stages) Expanding mutations (mutant AF is >0% at early stages) Time-series whole genome sequencing at 30x is sufficient to provide a representation of tumour cell populations / heterogeneity, provided each cell population is >10% Deep sequencing has confirmed the WGS analysis while providing greater sensitivity as mutations at very low frequencies (~1%) can be detected DNA samples were collected from a patient with Chronic Lymphocytic Leukemia at different time-points during his treatment Principal component analysis Component 2 Component 1 4. Cluster SNVs with similar mutant AF profiles using k-means Chlorambucil Diagnosis Remission Remission Relapse Relapse Germline Fludarabine Chlorambucil Rituximab short remission duration, aggressive disease, death despite treatment GL D P1 R2 preT3 R1 time Tumour progression Germline Tumours Tumour realigned reads Candidate Indels candidate indel search Somatic Caller Normal BAM Normal realigned reads realignment Tumour BAM realignment Post-call filtration Small Somatic variants Fludarabine Chlorambucil Rituximab There are two late subclones, one present at diagnosis and one emerging after the second line of treatment 13% 3% 80% 4% 8% 3% 88% 1% 5% 47% 12% 36% 2% 89% 1% 8% 3% 95% 0% 2% Founder mutations Early subclone mutations Late subclone mutations Non-cancer cancer D P1 R2 preT3 R1 6% <1% 13% 44% 5% 87% 3% 94% 9% D P1 R2 preT3 R1 Enumerate possible cells possible common ancestor (founder clone) Enumerate most parsimonious phylogenies Average HET mutant AF x 2 # of SNVs Mutational group D R1 P1 R2 preT3 1136 96% 99% 64% 92% 98% 686 80% 88% 12% 1% 0% 1241 3% 3% 47% 89% 95% 502 3% 5% 4% 5% 5% x w y z Because of constant < 10% mutant AF, the green mutational group is considered as noise w + x + y + z = 100% - GL contamination = + y + z = w + y = At each time-point Trees containing this subclone are rejected because it’s frequency is below 3% at most (below noise level) Deep-sequencing also provides accurate proportions CONCLUSION Somatic variants

Upload: amia

Post on 04-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

2013 Summit on Translational Bioinformatics

TRANSCRIPT

Page 1: clustering of Somatic Mutations to Characterize Cancer Heterogeneity With Whole Genome Sequencing

CLUSTERING OF SOMATIC MUTATIONS TO CHARACTERIZE CANCER HETEROGENEITY WITH WHOLE GENOME SEQUENCING J Becq1, A Alexa1, K Cheetham1, R Grocock1, Z Kingsbury1, A Timbs2, D McBride1, S Humphray1, M Ross1, A Schuh2 and D Bentley1

1illumina Cambridge Ltd., Chesterford Research Park, Cambridge, UK and 2Oxford NIHR Biomedical Research Centre, University of Oxford, Oxford UK

CANCER HETEROGENEITY

CLINICAL STUDY

DETECTION OF TUMOUR SPECIFIC MUTATIONS

TIME-SERIES ANALYSIS OF SOMATIC SINGLE NUCLEOTIDE VARIANTS

Mutant AF: 4/7 = 0.57

Mutant AF: 8/9 = 0.89

REF Base = A

Stage D

Stage R1

A

A

A

T

T

T

T

A

T

T

T

T

T

T

T

T

1. Select SNVs seen in at least one time-point 2. Filter for good quality (not in copy number aberration region, 15x < coverage < 200x in all but

1, genotype Qscore > 15 in all but 1) 3. Measure mutant Allele Frequency (mutant AF) at each time-point

chr position D R1 P1 R2 preT3

chr1 154543705 0.0000 0.0000 0.2174 0.4255 0.4242

chr2 198266834 0.5000 0.6364 0.3478 0.4091 0.5938

chr3 31107645 0.0303 0.0000 0.1928 0.4146 0.4483

… … … … … … …

chr21 1592215 0.0000 0.0000 0.0476 0.0526 0.2500

chr22 32831696 0.4211 0.5294 0.0538 0.0000 0.0000

chrX 142716811 0.0000 0.0000 0.4737 0.9000 1.0000

MUTATION PROFILES OF ALL SOMATIC SNVS

FROM CLUSTERS TO CLONES

Founder mutations

Mutations sensitive to treatment

Mutations resistant to treatment

Emerging mutations after treatment

Cancers are genomically diverse and dynamic entities

Clonal evolution generally selects for increased proliferation and survival, and might lead to invasion, metastasis and therapeutic resistance

Black lines: somatic SNVs with non-synonymous or nonsense consequences

VALIDATION WITH DEEP SEQUENCING

PROPOSED CELL POPULATION

Target

mutation of interest

Ultra-deep sequencing (50,000x) of amplicons for all somatic SNVs with non-synonymous consequences Report accurate mutant allele frequency for each amplicon

Amplify by PCR

Sequence on MiSeq instrument

Mutations present at all stages, regardless

of treatments

Mutations decreasing after Fludarabine +

Chlorambucil + Rituximab treatment

Emerging mutations (mutant AF is 0% at early stages)

Expanding mutations (mutant AF is >0% at early stages)

Time-series whole genome sequencing at 30x is sufficient to provide a representation of tumour cell populations / heterogeneity, provided each cell population is >10%

Deep sequencing has confirmed the WGS analysis while providing greater sensitivity as mutations at very low frequencies (~1%) can be detected

DNA samples were collected from a patient with Chronic Lymphocytic Leukemia at different time-points during his treatment

Principal component analysis

Co

mp

on

ent

2

Component 1

4. Cluster SNVs with similar mutant AF profiles using k-means

Chlorambucil

Diagnosis Remission Remission Relapse Relapse

Germline

Fludarabine Chlorambucil

Rituximab short remission duration, aggressive disease, death despite treatment

GL

D P1 R2 preT3 R1

time

Tumour progression

Germline

Tumours

Tumour realigned reads

Candidate Indels

candidate indel search

Somatic Caller

Normal BAM

Normal realigned reads

realignment

Tumour BAM

realignment

Post-call filtration

Small Somatic variants

Fludarabine Chlorambucil

Rituximab

There are two late subclones, one present at diagnosis and one emerging after the second line of treatment

13%

3%

80%

4%

8%

3%

88%

1%

5%

47%

12%

36%

2%

89%

1%

8%

3%

95%

0%

2%

Founder mutations

Early subclone mutations

Late subclone mutations

Non-cancer

cancer

D P1 R2 preT3 R1

6%

<1%

13%

44%

5%

87%

3%

94%

9%

D P1 R2 preT3 R1

Enumerate possible cells

possible common ancestor (founder clone)

Enumerate most parsimonious phylogenies

Average HET mutant AF x 2

# of SNVs

Mutational group

D R1 P1 R2 preT3

1136 96% 99% 64% 92% 98% 686 80% 88% 12% 1% 0%

1241 3% 3% 47% 89% 95%

502 3% 5% 4% 5% 5%

x w y z

Because of constant < 10% mutant AF, the green mutational group is considered as noise

w + x + y + z = 100% - GL contamination =

+ y + z =

w + y =

At

each

tim

e-p

oin

t

Trees containing this subclone are rejected because it’s frequency is below 3% at most (below noise level)

Deep-sequencing also provides accurate proportions

CONCLUSION

Somatic variants