identification of cancer drivers across tumor types
DESCRIPTION
Thousands of tumor genomes/exomes are being sequenced as part of the International Cancer Genome Consortium (ICGC), The Cancer Genome Atlas (TCGA) and other initiatives. This opens the possibility to have, for the first time, a comprehensive picture of mutations, genes and pathways involved in the cancer phenotype across tumor types. We have developed computational methods able to identify signals of positive selection in the pattern of tumor somatic mutations, which point to genes and pathways directly involved in the development of the tumors. We have applied these approaches to 3025 tumors from 12 different cancer types of the TCGA Pan-Cancer project, identifying 291 high-confidence cancer driver genes acting on those tumors (Tamborero et al 2013). We have also developed IntOGen-mutations (http://www.intogen.org/mutations), a novel web platform for cancer genomes interpretations, which analyses not only TCGA pan-cancer data but all mutation data from ICGC and other initiatives. The resource allows users to identify driver mutations, genes and pathways acting on more than 6000 tumors originated in 17 different cancer sites and to analyze newly sequence tumor genomes. Among the novel cancer drivers identified there are chromatin regulatory factors and splicing factors, which are emerging as important genes in cancer development and are regarded as interesting candidates for novel targets for cancer treatment. In my talk I will summarize all these recent findings. More info: http://bg.upf.edu/blog/2013/10/my-slides-on-identification-of-cancer-drivers-across-tumor-types/TRANSCRIPT
Nuria Lopez-BigasICREA Research Professor at Universitat Pompeu Fabra
Barcelonahttp://bg.upf.edu
Identification of cancer drivers across tumor types
Moving towards personalized cancer medicine
BRAF is frequently mutated in melanoma (V600E)
Vemurafenib
Vemurafenib
Vemurafenib
August 2011
Dibb et al., Nature Review Cancer 2004
Davies et al. Nature 2002
2 weeksVemurafenib
Personalized medicine / Precision medicine
Cancer Genomics
Nature 502, 306–307. 2013
Normal Cell Tumor Cell
Sequencing
Somatic mutations
Mrs. McDaniel
Sequencing tumor genomes
Which mutations are drivers?
Cancer is an evolutionary process
Yates and Campbell et al, Nat Rev Genet 2012
How to differentiate drivers from passengers?
ACTGCCTACGTCTCACCGTCGACTTCAAATCGCTTAACCCGTACTCCCATGCTACTGCATCTCGGGTTAACTCGACGTTTTTCATGCATGTGTGCACCCCAATATATATGCAACTTTTGTGCACCTCTGTCACGCGCGAGTTGGCACTGTCGCCCCTGTGTGCATGTGCACTGTCTCTCGCTGCACTGCCTACGTCTCACCGTCGACTTCAAATCGCTTAACCCGTACTCCCATGCTACTGCATCTCGGGTTAACTCGACGTTTTGCATGCATGTGTGCACCCCAATATATATGCAACTTTTGTGCACCTCTGTCACGCGCGAGTTGGCACTGTCGCCCCTGTGTGCATGTGCACTGTCTCTCGAGTTTTGCATGCATGTGTGCACTGTGCACCTCTGTTACGTCT
Find signals of positive selection across tumour re-sequenced genomes
Recurrence
Identify genes mutated more frequently than background mutation rate
MuSiC-SMG / MutSigR
Mutation
Signals of positive selection
Recurrence
Identify genes mutated more frequently than background mutation rate
MuSiC-SMG / MutSigCVR
Mutation
Signals of positive selection
Challenge: Background mutation rate varies across patients and genomic regions
Replication time
Stamatoyannoppoulos et al., Nature Genetics 2009 Schuster-Böckler and Lehner, Nature 2011
Chromatin organization
Signals of positive selection
• Based on consequences of mutations (eg. synonymous is
lowest and STOPgain, frameshift indel highest)
• And SIFT, PPH2 and MA for missense
How to measure functional impact of mutations?
Functional impact bias (FMbias)
Mutation
OncodriveFMF
Gonzalez-Perez and Lopez-Bigas. NAR 2012
Signals of positive selection
Functional impact bias (FMbias)
Mutation
OncodriveFMF
• It does not depend on background mutation rates
• Only needs list of somatic mutations
• It is computationally cheap
Main Advantages of FM bias approach
Gonzalez-Perez and Lopez-Bigas. NAR 2012
Signals of positive selection
Functional impact bias (FMbias)
Mutation
OncodriveFMF
One example: TCGA Glioblastoma FMbiasqvalue
TP53PTENEGRFNF1RB1FKBP9ERBB2PIK3R1PIK3CAPIK3C2GIDH1ZNF708FGFR3CDKN2AALDH1A3PDGFRAFGFR1MAPK9DCNPIK3C2ACHEK2PSMD13GSTM5
8.5E-118.5E-118.5E-118.5E-112.5E-98.5E-111.2E-81.2E-82.3E-40.0028.5E-117.4E-103.2E-92.5E-85.2E-51.5E-62.0E-62.2E-51.5E-66.2E-5111
not mutatedMA score
5-2 0 0.05 10
FM / MutSig qvalue
TP53CBFBGATA3MAP3K1
PIK3CA
AKT1
MutSig
MLLNOTCH2PCDHA7
OncodriveFM
Banerji et al Nature 2012. Which analyzes 103 breast tumors
PIK3CA is recurrently mutated in the same residue in breast tumours
Lowly scored by functional impact metrics
H1047L
PIK3CA
Protein position0 1047
Prot
ein
affe
ctin
g m
utat
ions
80
0
PIK3CA is a false negative of OncodriveFM in some Breast Cancer projects
Signals of positive selection
Mutation clustering
Mutation
OncodriveCLUST
Tamborero et al., Bioinformatics 2013
Th
Gene A Gene B(I)
(II)
(III)
(IV)
(V)
Th
SgeneA
= Sc1 S
geneB = Sc1
+ SC2
(VI)
0
ZA
ZB
mut
atio
ns
Amino acid
C1
C1 C2
Amino acid
mut
atio
ns
mut
atio
ns
mut
atio
ns
SgeneA
SgeneB
Background model obtained by calculating the clustering score per gene of the coding-silent mutations
Signals of positive selection: OncodriveCLUST
Tamborero et al., Bioinformatics 2013
TP53CBFBGATA3MAP3K1
PIK3CA
AKT1
MutSig
MLLNOTCH2PCDHA7
OncodriveFM
ERBB2PRKCZNME5AKR1C3RSBN1L
OncodriveCLUST
Banerji et al Nature 2012. Which analyzes 103 breast tumors
List of tumor somatic
mutations
Input data Analysis Pipeline (powered by Wok) Browser (powered by Onexus)
IntOGen mutations pipeline To interpret catalogs of cancer somatic mutations
Christian Perez-Llamas
Workflow Management Sytem
✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)✓ Identify pathways with FM bias (OncodriveFM)
Jordi Deu-Pons
Web browser creation
✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)✓ Identify pathways with FM bias (OncodriveFM)
Input data
Working version:41 Projects 17 Cancer sites~6300 tumours
.org
http://www.intogen.org/mutations
IntOGen mutations pipeline To interpret catalogs of cancer somatic mutations
Gonzalez-Perez et al, Nature Methods 2013
List of tumor somatic
mutations
Current version:31 Projects 13 Cancer sites4623 tumours
Analysis Pipeline (powered by Wok) Browser (powered by Onexus)
Site Number of projects
Samples
Bladder 1 98Brain 3 491
Breast 6 1148Colorectal 2 229
Head and neck 2 375Hematopoietic 3 395
Kidney 1 417Liver 1 24Lung 6 664
Ovary 1 316Pancreas 3 214Stomach 1 22
Uterus 1 230TOTAL 31 4623
Projects in current version of IntOGen
Gonzalez-Perez et al, Nature Methods 2013
Combining results across projects
0.05 1
p-value
0
proj
ect 1
samples
gene
s
Functional Impact
project 1
HighLowNo mutation
OncodriveFM
gene
s
+
proj
ect 2
proj
ect 3
proj
ect 4
Can
cer s
ite A
...combine
Cancer site A
Gonzalez-Perez et al, Nature Methods 2013
http://www.intogen.org/mutationshttp://www.gitools.org/datasets
Comprehensive view of cancer vulnerability across tumor types
Gonzalez-Perez et al, Nature Methods 2013
http://www.intogen.org/mutationshttp://www.gitools.org/datasets
Comprehensive view of cancer vulnerability across tumor types
0.4 0.3 0.2 0.1Mutation frequency
http://www.intogen.org/mutations
APC in IntOGen-mutations
APC in IntOGen-mutations
APC in IntOGen-mutations
Search for driver genes and mutations in a breast cancer project
Candidate driver genes in the project, sorted by FMbias
http://www.intogen.org/mutations/analysis
Gonzalez-Perez et al, Nature Methods 2013
IntOGen-mutations pipelineTo interpret catalogs of cancer somatic mutations
The mutational landscape of chromatin regulatory factors (CRFs) across 4623
tumor samples
Gonzalez-Perez et al, Genome Biology 2013
34 out of 184 CRFs show signals of positive selection across 4623 tumors
Gonzalez-Perez et al, Genome Biology 2013
49198 1149 229 375 395 417 24 664 316 214 22 230 Number of samples
Mutation frequency
0 0.3
ARID1AKMT2CDNMT3AKDM6APBRM1NSD1TET2SETD2SMARCA4KMT2DCHD4NCOR1EP300KDM5CARID2ATF7IPASXL1MLLBAZ2ACHD3ATRXARID1BMBD1BAP1INO80CHD2ARID4ADOT1LASH1LBPTFRTF1PHC3SMARCA2SETDB1
28 5 27 12 14 3 12 3 29 3 5 6 7126 17 75 12 28 2 17 0 80 8 14 2 150 0 5 2 12 51 5 0 22 3 0 0 328 8 11 2 8 3 5 0 15 0 2 1 46 2 7 1 11 0 135 0 15 2 4 1 88 2 13 4 38 2 11 0 21 2 0 3 174 2 6 2 4 18 9 0 15 0 1 0 88 7 14 5 7 1 46 0 25 5 0 2 79 9 12 4 14 0 12 1 28 7 4 0 1127 12 26 5 60 1 19 0 64 3 1 0 2410 4 22 9 8 2 6 1 19 9 0 2 259 2 42 2 16 0 7 0 20 1 0 2 517 4 9 5 28 1 7 0 18 1 2 1 131 2 11 1 5 0 27 0 22 6 0 0 76 3 12 5 10 1 8 1 27 7 3 0 53 3 3 1 8 1 5 1 17 1 1 1 74 3 9 3 11 8 5 0 15 1 0 0 514 5 17 5 12 1 7 0 26 6 1 1 87 2 9 3 4 0 3 0 10 1 0 0 96 1 9 0 12 0 6 1 16 0 0 2 148 18 12 2 23 2 9 0 35 2 1 2 106 5 17 1 16 0 5 0 23 1 1 2 94 2 5 1 12 0 7 0 20 3 0 0 105 3 7 0 5 0 43 0 11 3 0 1 510 0 6 2 5 0 4 0 9 1 2 2 37 3 10 4 10 5 4 0 12 1 0 1 55 3 5 3 2 0 4 0 15 3 0 3 99 2 8 0 10 1 5 0 18 0 2 0 39 2 17 5 7 0 10 0 27 3 0 3 128 7 11 3 7 1 4 0 21 3 0 1 92 1 0 0 5 0 2 0 5 2 0 1 00 4 8 0 8 0 2 0 10 1 0 0 75 4 9 2 10 2 4 1 21 2 1 0 82 1 15 4 11 1 7 0 18 3 0 1 12
BLAD
DER
BRAI
N
BREA
ST
CO
LOR
ECTA
L
HEA
D &
NEC
K
HEM
ATO
POIE
TIC
KID
NEY
LIVE
R
LUN
G
OVA
RY
PAN
CR
EAS
STO
MAC
H
UTE
RU
S
0.07
Mutation frequency of the 34 driver CRFs
SWI/SNFPRC1
PRC2
ISWI
NuRD/Mi-2
CRFs work as complexes
Gonzalez-Perez et al, Genome Biology 2013
FMbias of CRFs complexes
Gonzalez-Perez et al, Genome Biology 2013
ARID1APBRM1EP400SMARCA4ARID1BARID2SMARCA2SMARCC2SMARCC1SMARCB1DPF2DPF3ACTL6ASMARCD1SMARCD3ACTL6BSMARCE1DPF1PHF10SMARCD2
218 0.047192 0.042122 0.026111 0.02486 0.01988 0.01969 0.01551 0.01130 0.00636 0.00837 0.00817 0.00423 0.00522 0.00534 0.00719 0.00412 0.00311 0.00215 0.00326 0.006
N Freq
SWI/SNFkidney lung uteribladder breast
SWI/SNF complex
Gonzalez-Perez et al, Genome Biology 2013
Glioblastoma TCGA
Glioblastoma JHU
Pediatric Brain DKFZ
0.2
0.4
Mutated CRFs / site-specific drivers ratio
TP53PTENEGFRNF1IDH1RB1PIK3R1ATRXKMT2CCTNNB1DDX3XSTAG2MYH8SMARCA4PRDM9LZTR1KDM6ARPL5WDR90BPTFSETD2EP300ARID1AKDM5CATF7IPNCOR1CHD4PBRM1PHC3BAP1MBD1NSD1CHD2CHD3
MA FIS score
-2 0 4.5
Paediatricmedulloblastoma Glioblastoma JHU Glioblastoma TCGA
Differences in relative important of driver CRFs between cancer types
Gonzalez-Perez et al, Genome Biology 2013
Pan-Cancer Project - The Cancer Genome Atlas
TCGA PanCancer Network, Nature Genetics 2013
TCGA pan-cancer project
12 cancer types - 3205 tumors
Project Name Tumor Type Number of samples
BLCA
BRCA
COADREADGBM
HNSC
KIRCLAMLLUADLUSC
OV
UCEC
Bladder Urothelial Carcinoma 98
Breast invasive carcinoma 762
Colon and Rectum adenocarcinoma 193Glioblastoma multiforme 290
Head and Neck squamous cell carcinoma 301
Kidney renal clear cell carcinoma 417Acute Myeloid Leukemia 196Lung adenocarcinoma 228
Lung squamous cell carcinoma 174
Ovarian serous cystadenocarcinoma 316
Uterine Corpus Endometrioid Carcinoma 230
3205
TCGA PanCancer Network, Nature Genetics 2013
OncodriveFMF
OncodriveCLUSTC
ActiveDriverA
Rec
urre
nce
Identify genes mutated more frequently than background mutation rate
FM b
ias
Identify genes with a bias towards high functional mutations (FM bias)
Identify genes with a significant regional clustering of mutations
CLU
ST b
ias
Identify genes significantly enriched in mutations affecting phosphorylation-associated sitesA
CTI
VE b
ias
Functional Impact (FI) Score
MuSiC-SMGR
Mutation
Mutation
phosphorylation-associated siteMutation
Mutation
Complementary signals of positive selection
MutSigCVM
OncodriveFM
OncodriveCLUST
MuSiC-SMG
ActiveDriver
F
C
R
A
Using complementary signals help obtaining a more comprehensive list of cancer drivers
Tamborero et al., Scientific Reports 2013
Genes exhibiting more than one signal are more likely true drivers
Tamborero et al., Scientific Reports 2013
Pan-cancer and per-project analysis
Tamborero et al., Scientific Reports 2013
291 High Confident Cancer Drivers
Tamborero et al., Scientific Reports 2013
0.4
0.3
0.2
0.1
TP53
PIK3CA
PTENAPC
CDKN2C
Most driver genes are lowly frequently mutated
HRASSF3B1
8 / 3205 (0.002)
BLCABRCACOADREADGBMHNSC
KIRCLAMLLUADLUSCOVUCEC
Tamborero et al., Scientific Reports 2013
Most drivers map to 5 cancer hallmarks
http://www.intogen.org/tcga Tamborero et al., Scientific Reports 2013
BLCABRCACOADREADGBMHNSC
KIRCLAMLLUADLUSCOVUCEC
Some drivers show clear specificity for one tumor type
Tamborero et al., Scientific Reports 2013
Some novel driver genes map to well-known cancer pathways
Novel cancer gene
Stablished cancer gene
0
0.05
0.10
0.15
0.20
0 1 2 3 4 5 6 7 8 9 1011
-15
16-2
021
-25
26-3
0>3
0
PANCANCER
Number of PAMs in HCDs
Prop
ortio
n of
sam
ples
3038(0.95)4(4)
49(63)
Samples with at least one PAM in HCDsMedian (IQR) of PAMs in HCDs per sample
Median (IQR) of PAMs in all genes per sample
95% of tumors have PAMs in at least one driver
PAMs: Protein affecting mutations Tamborero et al., Scientific Reports 2013
0
0.25
0.50
0.75
1.00
LAML OV KIRC BRCA GBM COADREAD HNSC UCEC LUAD LUSC BLCA
Prop
ortio
n of
sam
ples
165 (0.85) 312 (0.99) 393 (0.94) 710 (0.93) 272 (0.94) 193 (1.0) 299 (0.99) 228 (0.99) 221 (0.98) 172 (0.99) 98 (1.0)2 (3) 2 (2) 3 (3) 3 (2) 4 (3) 5 (2) 6 (5) 6 (9) 9 (8) 9 (7) 9.5 (7.5)8 (7) 40 (276) 45 (24) 28 (27) 51 (23) 65 (47) 97 (79) 48 (153) 183 (248) 209 (123) 160 (157)
LAML OV KIRC BRCA GBM COAREAD HNSC UCEC LUAD LUSC BLCA
Median of 4 PAMs in drivers per sample with variability per cancer type
PAMs: Protein affecting mutations Tamborero et al., Scientific Reports 2013
• Cancer genomics projects aim to unravel the mechanisms of tumorigenesis to advance towards personalized cancer medicine
• To identify cancer driver genes we search for signals of positive selection in the pattern of somatic mutations
• IntOGen-mutations contains results of analysing more than 4500 tumours (6200 in new version) to identify cancer drivers across tumor types
• IntOGen-mutations can analyse newly sequenced tumor genomes to identify likely driver mutations
• 34 chromatin regulatory factors show signals of positive selection in the tumor somatic mutation pattern
• 291 high-confidence cancer driver genes detected in TCGA Pan-Cancer 12 by combining complementary signals of positive selection
Summary
Biomedical Genomics Lab
@bbglab@nlbigas
Christian Perez-LlamasJordi Deu-Pons
Michael Schroeder
Carlota Rubio
Nuria Lopez-Bigas
David Tamborero
Abel Gonzalez-Perez
http://bg.upf.edu/blog