synonymous mutations - from bacterial evolution to somatic

57
Synonymous mutations - from bacterial evolution to somatic changes in human cancer Fran Supek 1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona 2) Division of Electronics, RBI, Zagreb, Croatia XXI Jornades de Biologia Molecular Barcelona, 11.6.2014

Upload: lytuyen

Post on 15-Dec-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synonymous mutations - from bacterial evolution to somatic

Synonymous mutations - from bacterial evolution to somatic

changes in human cancer

Fran Supek

1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona

2) Division of Electronics, RBI, Zagreb, Croatia

XXI Jornades de Biologia Molecular

Barcelona, 11.6.2014

Page 2: Synonymous mutations - from bacterial evolution to somatic

synonymous mutations =changes in the gene sequencethat don’t alter the protein sequence

Page 3: Synonymous mutations - from bacterial evolution to somatic

Synonymous mutations

• (some) synonymous mutations are subject to evolutionary pressures• clearly shown for many bacteria and yeasts

• likely also higher Eukarya (but weaker signal)

• how does selection for/against synonymous changes relate to gene function in (a) evolution of bacteria and (b) in carcinogenesis?

evolutionary trace across ~1000 bacterial genomes somatic mutations in ~4000 human cancers

malignant transformationadaptation to diverse environments

( plush microbes in photos are from http://www.giantmicrobes.com/ )

Page 4: Synonymous mutations - from bacterial evolution to somatic

• In what way can evolution of synoymous codon preferences be used to systematically infer gene function in bacteria?

• There are other simpler (known) ways to determine gene function from the genome sequences:

• commonly/systematically applied: transfer of annotation via sequence similarity (BLAST, COG, Pfam...)

• >30% of genes end up with no known function annotated. They may not have known homologs, or their homologs may have no experimentally determined function.

• known but less common: genomic context methods, such as phyletic profiling

evolutionary trace across ~1000 bacterial genomes

adaptation to diverse environments

( plush microbes in photos are from http://www.giantmicrobes.com/ )

Page 5: Synonymous mutations - from bacterial evolution to somatic

Phyletic (or phylogenetic) profiling

Pellegrini, Marcotte et al., PNAS (1999)

one genomic context method:

examines presence/absence patterns of homologous genes across species.

Page 6: Synonymous mutations - from bacterial evolution to somatic

Kensche et al. (2008) J Royal Soc Interface. ~30 examples of success of phyletic profiling

• by 2008 -> n~=30

• by 2014 -> n~=300 (estimate)

• aim for: N > 3000

Page 7: Synonymous mutations - from bacterial evolution to somatic

Enriching phyletic profileswith information on orthology and paralogy

Species 1

Species 2

… Species

997 Species

998 Function

OMA 1 … 0 GO:001,

GO:007

OMA 2 0 … ?

… … … … … … …

OMA 64051 0 … 0 0 GO:042

OMA 64052 0 … GO:003,

GO:160

orthologs in cliquesorth. outside cliquesparalogs

groups of orthologs from OMA database:Schneider, Dessimoz and Gonnet (2007) Bioinformatics

Skunca et al. PLoS Comp Biology 2013doi:10.1371/journal.pcbi.1002852

Page 8: Synonymous mutations - from bacterial evolution to somatic

Accuracy of predicting GO categories strongly increases when adding paralogs

+ paralogs + orthologs(outside clique)

+ para + orthoclique only

(bubbles are Gene Ontology categories)

Page 9: Synonymous mutations - from bacterial evolution to somatic

Supervised machine learning is superior to common approaches based on pairwise distances

Based on correlationof profiles

AU

C (

area

un

der

R

OC

cu

rve)

Decision trees

Schietgat et al. 2010. BMC Bioinfo

Page 10: Synonymous mutations - from bacterial evolution to somatic

Experimental validation of predictions made with phyletic profiling

• knockout mutants of E. coli in predicted genes

• three selected GO categories targeted by particular antibiotics:• ‘response to DNA damage’

• ‘translation’

• ‘peptidoglycan-based cell wall biogenesis’

• predictions: 38 genes with expected precision > 60%

Page 11: Synonymous mutations - from bacterial evolution to somatic

0%

20%

40%

60%

80%

100%

120%

140%

160%

w.t.

dbpA

rh

lB

yhbJ

pm

bA

rhlE

tldD

yidD

ynbB

envC

murE

nalidixic acid ampicillin kasugamycin

Su

rviv

al c

om

pa

red

to

th

e w

ild t

yp

e

inhibitstranslationinitiation

inhibits cell wall synthesis

DNA damaging

agent

Page 12: Synonymous mutations - from bacterial evolution to somatic

0%

20%

40%

60%

80%

100%

120%

140%

160%

w.t.

dbpA

rh

lB

yhbJ

pm

bA

rhlE

tldD

yidD

ynbB

envC

murE

nalidixic acid ampicillin kasugamycin

Su

rviv

al c

om

pa

red

to

th

e w

ild t

yp

e

Does this gene participate in ‘peptidoglycan-based cell wall biogenesis’ ?

Page 13: Synonymous mutations - from bacterial evolution to somatic

0%

20%

40%

60%

80%

100%

120%

140%

160%

w.t.

dbpA

rh

lB

yhbJ

pm

bA

rhlE

tldD

yidD

ynbB

envC

murE

nalidixic acid ampicillin kasugamycin

Su

rviv

al c

om

pa

red

to

th

e w

ild t

yp

e Does this gene participate in ‘peptidoglycan-based cell wall biogenesis’ ?

25/38 validated predictions (experimental precision = 66%; theoretically expected = 60%) our method is useful for prioritizing genes for experimentally

determining gene function

Page 14: Synonymous mutations - from bacterial evolution to somatic

http://gorbi.irb.hr/

Page 15: Synonymous mutations - from bacterial evolution to somatic

“We predict Gene Ontology annotations ... for about 1.3 million poorly annotated genes in 998 prokaryotes at a stringent threshold of 90% Precision...”

“...about 19000 of those are highly specific functions.”

published in:Skunca et al. PLoS Comp Biology 2013doi:10.1371/journal.pcbi.1002852

Page 16: Synonymous mutations - from bacterial evolution to somatic

• Codon usage biases are another useful source of evolutionary information

• ... complementary to gene presence/absence• ... available from just the genome sequence• ... with an established biological rationale

Page 17: Synonymous mutations - from bacterial evolution to somatic

tRNA levels and codon usage biases

E. coli K-12, tRNA gene counts (proxy for tRNA levels)

codon

anticodon

Commonly used codons typically correspond to abundant tRNAs, particularly in highly expressed genes.

Page 18: Synonymous mutations - from bacterial evolution to somatic

Codon biases correlate to gene expression

0.5

1

1.5

2

2.5

0.5 1 1.5 2 2.5 3 3.5

MIL

C (n

on

-RP

gen

es)

MILC (ribosomal protein genes)

ribosomal protein genes other highly expressed genes rest of genome

B

Figure from

Supek and Vlahoviček (2005)

BMC Bioinformatics

doi:10.1186/1471-2105-6-182

E. coli genome

Page 19: Synonymous mutations - from bacterial evolution to somatic

• organisms adapt to the environment through changes in translation efficiency?

• Carbone A (2005) J Mol Evol – codon adaptation in metabolic pathways:

Photosynthesis genes in Synechocystis

Methanogenesis genes in Methanosarcina

Archaea

Bacteria

Page 20: Synonymous mutations - from bacterial evolution to somatic

An example phenotype: oxygen requirement

• Man & Pilpel (2007) Nat Genet: 9 yeasts

TCA cycle glycolysis

aerobic anaerobic (low) codon adaptation (high)

• Based on these examples, we aimed to systematically link:

• Many environments/phenotypes, with

• evolutionary change in translation efficiency across many gene families

Page 21: Synonymous mutations - from bacterial evolution to somatic

Measuring translation efficiency

Method from

Supek et al. (2010)

PLoS Geneticsdoi:10.1371/journal.pgen.1001004

non-HE HE

4-20% of genome

Expression levels: microarrayson 19 diverse bacteria

0

1

2

3

4

log

2e

xpre

ssio

n r

ati

o

OCU/non-OCU, from ref. [7] HE/non-HE ribosomal proteins/all genes

gene 1

intergenicDNA

codonusage

all otherproteingenes

highly expressed

genes *

increasein

probability after adding

codon usage?

classifier predicts probability:

expr.

A

gene1

gene2

gene3

* ribosome, translationelongation factors, chaperones

vs.

B

C

3.9x6.0x

Page 22: Synonymous mutations - from bacterial evolution to somatic

Correlation vs. causality?

a randomization test to control for confounding phenotypes and phylogeny

This passes the randomization test:

This fails (association not unique):

associations between phenotypes, and also with phylogeny:

Page 23: Synonymous mutations - from bacterial evolution to somatic

• 514 aerotolerant vs. 214 aerointolerant:

295 COGs are significantly enrichedwith HE genes

• obligate vs. facultative aerobes:

• thermophiles

• halophiles

+ 20 other phenotypes tested

control for confounders 23 COGs

11 COGs

16 COGs

6 COGs

Page 24: Synonymous mutations - from bacterial evolution to somatic

Gene families linked to aerotolerance

all experiments: Anita Kriško lab (Mediterranean Institute for Life Science, Split, Croatia)published as Kriško et al, Genome Biology 2014. doi:10.1186/gb-2014-15-3-r44

0%

20%

40%

60%

80%

100%

120%

w.t

.

yjjB

flg

H

cysG

mn

mA

nlp

E

pro

Xosmotic oxidative heat

C

0%

20%

40%

60%

80%

100%

120%

w.t

.

clp

S

op

pA tig

ssu

D

nu

dF

pn

p

typ

A

mng

R

lsrR

yeb

S

rhlE

yajL

pyk

F

dtd

eu

tD

glo

B

yfcA

ma

rR

yccX

pn

cB

ttd

B

mo

aA

dsb

B

surv

ival

, no

rmal

ize

d to

w.t

.

heat oxidative osmotic

B

0x

1x

2x

3x

4x

5x

6x

0%

20%

40%

60%

80%

100%

120%

NA

C /

no

NA

C s

urv

ival

rat

io

surv

ival

, n

orm

aliz

ed

to

w.t

.

2.5 mM H2O2 5 mM NAC pretreatment heat shock osmotic shock

A

** ** **

* known antioxidant proteins in E. coli (or homologs in other organisms)

* known to be regulated in response to air or oxidative stress

positive control

2 nonspeci-fic hits

Page 25: Synonymous mutations - from bacterial evolution to somatic

ca

rbo

nyla

tion

incre

ase

DH

R-1

23

incre

ase

Ce

llRO

X

incre

ase

tota

lF

e

incre

ase

dip

yrid

yl

rescu

e

NA

DP

Hle

ve

lin

cre

ase

NA

DP

Hre

scu

e

fresufD

rseCsodA

w.t.

clpArecA

napFlon

ybeQ

yaaUcysD

ybhJgpmM

icdlpd

yidH

0 0.4 0.8ROS levels in the mutants

ca

rbo

nyla

tion

incre

ase

DH

R-1

23

incre

ase

Ce

llRO

X

incre

ase

tota

lF

e

incre

ase

dip

yrid

yl

rescu

e

NA

DP

Hle

ve

lin

cre

ase

NA

DP

Hre

scu

e

fresufD

rseCsodA

w.t.

clpArecA

napFlon

ybeQ

yaaUcysD

ybhJgpmM

icdlpd

yidH

0 0.4 0.8

positive control

wild-type

ROS are typically not increased (except cysD, yaaU, rseC, and the positive control sodA)

Page 26: Synonymous mutations - from bacterial evolution to somatic

Predicted functional interactions from STRING v9

Gene families whose codon biases are associated to aerobicity/aerotolerance:

Page 27: Synonymous mutations - from bacterial evolution to somatic

ca

rbo

nyla

tion

incre

ase

DH

R-1

23

incre

ase

Ce

llRO

X

incre

ase

tota

lF

e

incre

ase

dip

yrid

yl

rescu

e

NA

DP

Hle

ve

lin

cre

ase

NA

DP

Hre

scu

e

fresufD

rseCsodA

w.t.

clpArecA

napFlon

ybeQ

yaaUcysD

ybhJgpmM

icdlpd

yidH

0 0.4 0.8Putative mechanisms of oxidative stress resistance

NAD(P)Hrelated

iron-related

unknown

all experiments: Anita Kriško lab (Mediterranean Institute for Life Science, Split, Croatia)published as Kriško et al, Genome Biology 2014. doi:10.1186/gb-2014-15-3-r44

carb

on

ylat

ion

incr

ease

DH

R-1

23

incr

ease

Cel

lRO

Xin

crea

se

tota

l Fe

incr

ease

dip

yrid

ylre

scu

e

NA

DP

H le

vel

dec

reas

e

exo

gen

ou

s N

AD

PH

res

cue

Page 28: Synonymous mutations - from bacterial evolution to somatic

0%

20%

40%

60%

80%

100%

120%

w.t

.

yjjB

flg

H

cysG

mn

mA

nlp

E

pro

X

osmotic oxidative heat

C

0%

20%

40%

60%

80%

100%

120%

w.t

.

clp

S

op

pA tig

ssu

D

nu

dF

pn

p

typ

A

mng

R

lsrR

yeb

S

rhlE

yajL

pyk

F

dtd

eu

tD

glo

B

yfcA

ma

rR

yccX

pn

cB

ttd

B

mo

aA

dsb

B

surv

ival

, no

rmal

ize

d to

w.t

.

heat oxidative osmotic

B

0x

1x

2x

3x

4x

5x

6x

0%

20%

40%

60%

80%

100%

120%

NA

C /

no

NA

C s

urv

ival

rat

io

surv

ival

, n

orm

aliz

ed

to

w.t

.

2.5 mM H2O2 5 mM NAC pretreatment heat shock osmotic shock

A

Other phenotypes: thermophilicity, halophilicity

Knockout of candidate genes affects heat shock resistance and osmotic shock resistance.

Page 29: Synonymous mutations - from bacterial evolution to somatic

Validation using synthetic genes with introduced suboptimal codons

0%

5%

10%

15%

20%

25%

30%

w.t. ΔclpS ΔclpS + clpS_w.t.

ΔclpS + clpS_15

ΔclpS + clpS_20

ΔclpS + clpS_25

% s

urv

ival

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.5 1 1.5 2 2.5

rela

tive

fre

qu

en

cy

codon distance (MILC) to ribosomal protein genes

ribosomal protein genes

all other E. coli genesw.t.

1520 25

w.t.

21 28 35

yjjB

clpS

0%

5%

10%

15%

20%

25%

30%

w.t. ΔyjjB ΔyjjB + yjjB_w.t.

ΔyjjB + yjjB_21

ΔyjjB + yjjB_28

ΔyjjB + yjjB_35

% s

urv

ival

osmotic shock

heat shockC

DB

A

all experiments: Anita Kriško lab (Mediterranean Institute for Life Science, Split, Croatia)published as Kriško et al, Genome Biology 2014. doi:10.1186/gb-2014-15-3-r44

Page 30: Synonymous mutations - from bacterial evolution to somatic

Overall:

• 200 links between 187 different COG gene families

- and -

24 diverse phenotypic traits, including• spore-forming ability

• motility

• pathogenicity to plants or mammals• affecting certain tissues/organs

• (1000s predictions at less stringent thresholds)

Page 31: Synonymous mutations - from bacterial evolution to somatic

• Anita Kriško - Mediterranean Institute for Life Sciences (MedILS)Split, Croatia.

all experimentalwork shown

• Nives ŠkuncaETH Zurich.

phyletic profiling

Page 32: Synonymous mutations - from bacterial evolution to somatic
Page 33: Synonymous mutations - from bacterial evolution to somatic

Cancer

3851 cancer exomes from 11 tissues (>200 samples each)292,405 missense and 123,193 synonymous somatic mutations

ARE THE SYNONYMOUS MUTATIONS SELECTED FOR IN CARCINOGENESIS?

Page 34: Synonymous mutations - from bacterial evolution to somatic

from Lawrence et al (2013) Nature. Mutation rate varies widely across the genome and correlates with DNA replication time and expression level.

Page 35: Synonymous mutations - from bacterial evolution to somatic

from Schuster-Böckler and Lehner (2012)heterochromatin correlates to SNV rates

Page 36: Synonymous mutations - from bacterial evolution to somatic

Drivers vs. passengers

• many somatic mutations in cancer = „passengers”

• a driver = a gene that confers a selective advantage. Recurrently mutated (ie. more than expected)

1. For missense, could be measured using the dN/dS

2.

3. commonly: find backgroud mut. frequencies for patient from entire exome see if a gene is above that background

Intronic rates as a baseline: INVEX testHodis et al. (Cell 2012)

Page 37: Synonymous mutations - from bacterial evolution to somatic

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr

elat

ion

to

PC

2 (

24

.3 %

)

correlation to PC1 (30.4 % variance)

carcinoma, 1Mbnon-carcinoma, 1Mb

pooled, 200kbliver, 200kb

liver, 1Mbbreast, 1Mb

H3K9me3,1Mb

GC3

RepliSeq,1Mb

hypothalamusliver

skeletal & heart muscle

6 tissues

regional mutation rates

mRNA levels

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.224P = 0.017

0

0.2

0.4

0.6

0.8

1

9 19 29

D+ = 0.313P = 0.0004

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.464P = 2.4·10-8

0

0.2

0.4

0.6

0.8

1

9 19 29

D- = 0.256P = 0.005

0

0.2

0.4

0.6

0.8

1

D+ =0.211P = 0.026

earlylate

oncogenes:

translocation(217)

missense(40)

copy number (12)

tumorsuppressors:

all mechanisms

(84)

Cancer GeneCensusA

recurrently mutated genes(self-reported in literature)

matched sets of noncancer genes:

1517 genes (for oncogenes)

693 genes (for tumor suppressors)

complete set of 13219 noncancer genes

B

known cancer genes

in Census

others:336

39

38

C

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

0

0.2

0.4

0.6

0.8

1D- = 0.199P = 0.043

earlylate

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D+ = 0.215P = 0.025

39 oncogenes (recurrently mutated)

38 tumor suppressors (recurr. mutated)

D

19 1821missense-activatedoncogenes

recurrently mutated(from literature)

oncogenes

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D- = 0.185P = 0.061

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr

elat

ion

to

PC

2 (

24

.3 %

)

correlation to PC1 (30.4 % variance)

carcinoma, 1Mbnon-carcinoma, 1Mb

pooled, 200kbliver, 200kb

liver, 1Mbbreast, 1Mb

H3K9me3,1Mb

GC3

RepliSeq,1Mb

hypothalamusliver

skeletal & heart muscle

6 tissues

regional mutation rates

mRNA levels

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.224P = 0.017

0

0.2

0.4

0.6

0.8

1

9 19 29

D+ = 0.313P = 0.0004

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.464P = 2.4·10-8

0

0.2

0.4

0.6

0.8

1

9 19 29

D- = 0.256P = 0.005

0

0.2

0.4

0.6

0.8

1

D+ =0.211P = 0.026

earlylate

oncogenes:

translocation(217)

missense(40)

copy number (12)

tumorsuppressors:

all mechanisms

(84)

Cancer GeneCensusA

recurrently mutated genes(self-reported in literature)

matched sets of noncancer genes:

1517 genes (for oncogenes)

693 genes (for tumor suppressors)

complete set of 13219 noncancer genes

B

known cancer genes

in Census

others:336

39

38

C

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

0

0.2

0.4

0.6

0.8

1D- = 0.199P = 0.043

earlylate

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D+ = 0.215P = 0.025

39 oncogenes (recurrently mutated)

38 tumor suppressors (recurr. mutated)

D

19 1821missense-activatedoncogenes

recurrently mutated(from literature)

oncogenes

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D- = 0.185P = 0.061

„classical” cancer genes:newly discovered, fromcancer genomes:

Page 38: Synonymous mutations - from bacterial evolution to somatic

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr

elat

ion

to

PC

2 (

24

.3 %

)

correlation to PC1 (30.4 % variance)

carcinoma, 1Mbnon-carcinoma, 1Mb

pooled, 200kbliver, 200kb

liver, 1Mbbreast, 1Mb

H3K9me3,1Mb

GC3

RepliSeq,1Mb

hypothalamusliver

skeletal & heart muscle

6 tissues

regional mutation rates

mRNA levels

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.224P = 0.017

0

0.2

0.4

0.6

0.8

1

9 19 29

D+ = 0.313P = 0.0004

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.464P = 2.4·10-8

0

0.2

0.4

0.6

0.8

1

9 19 29

D- = 0.256P = 0.005

0

0.2

0.4

0.6

0.8

1

D+ =0.211P = 0.026

earlylate

oncogenes:

translocation(217)

missense(40)

copy number (12)

tumorsuppressors:

all mechanisms

(84)

Cancer GeneCensusA

recurrently mutated genes(self-reported in literature)

matched sets of noncancer genes:

1517 genes (for oncogenes)

693 genes (for tumor suppressors)

complete set of 13219 noncancer genes

B

known cancer genes

in Census

others:336

39

38

C

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

0

0.2

0.4

0.6

0.8

1D- = 0.199P = 0.043

earlylate

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D+ = 0.215P = 0.025

39 oncogenes (recurrently mutated)

38 tumor suppressors (recurr. mutated)

D

19 1821missense-activatedoncogenes

recurrently mutated(from literature)

oncogenes

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D- = 0.185P = 0.061

Detecting positive selection on synonymous mutations in cancer

• create „matched sets” of genes closely following the oncogenes in:

• regional mutation rates• In 1 Mb and 200 kb windows

• expression levels in different tissues

• Heterochromatin, replication timing

• G+C content

Page 39: Synonymous mutations - from bacterial evolution to somatic

How to find a good set of genes?

A genetic algorithm. An optimization technique that can (relatively)easily handle many criteria at once. Quite efficient. Many parameters.

Operators:

...crossover

...random mutation

Page 40: Synonymous mutations - from bacterial evolution to somatic

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr

elat

ion

to

PC

2 (

24

.3 %

)

correlation to PC1 (30.4 % variance)

carcinoma, 1Mbnon-carcinoma, 1Mb

pooled, 200kbliver, 200kb

liver, 1Mbbreast, 1Mb

H3K9me3,1Mb

GC3

RepliSeq,1Mb

hypothalamusliver

skeletal & heart muscle

6 tissues

regional mutation rates

mRNA levels

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.224P = 0.017

0

0.2

0.4

0.6

0.8

1

9 19 29

D+ = 0.313P = 0.0004

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.464P = 2.4·10-8

0

0.2

0.4

0.6

0.8

1

9 19 29

D- = 0.256P = 0.005

0

0.2

0.4

0.6

0.8

1

D+ =0.211P = 0.026

earlylate

oncogenes:

translocation(217)

missense(40)

copy number (12)

tumorsuppressors:

all mechanisms

(84)

Cancer GeneCensusA

recurrently mutated genes(self-reported in literature)

matched sets of noncancer genes:

1517 genes (for oncogenes)

693 genes (for tumor suppressors)

complete set of 13219 noncancer genes

B

known cancer genes

in Census

others:336

39

38

C

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

0

0.2

0.4

0.6

0.8

1D- = 0.199P = 0.043

earlylate

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D+ = 0.215P = 0.025

39 oncogenes (recurrently mutated)

38 tumor suppressors (recurr. mutated)

D

19 1821missense-activatedoncogenes

recurrently mutated(from literature)

oncogenes

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D- = 0.185P = 0.061

Oncogenes: Tumor suppressors:

Distributions of regional mutation rates (1Mb and 200 kb), heterochromatin, etc. in the optimized sets of non-cancer genes closely match the cancer genes. Genetic algorithm tries to minimize the K-S statistic.

Page 41: Synonymous mutations - from bacterial evolution to somatic

-0.5

-0.25

0

0.25

0.5

0.75

1

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr

ela

tio

n t

o P

C2

(2

4.3

%)

correlation to PC1 (30.4 % variance)

carcinoma, 1Mbnon-carcinoma, 1Mb

pooled, 200kbliver, 200kb

liver, 1Mbbreast, 1Mb

H3K9me3,1Mb

GC3

RepliSeq,1Mb

hypothalamusliver

skeletal & heart muscle

6 tissues

regional mutation rates

mRNA levels

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.224P = 0.017

0

0.2

0.4

0.6

0.8

1

9 19 29

D+ = 0.313P = 0.0004

0

0.2

0.4

0.6

0.8

1

-2 0 2

D- = 0.464P = 2.4·10-8

0

0.2

0.4

0.6

0.8

1

9 19 29

D- = 0.256P = 0.005

0

0.2

0.4

0.6

0.8

1

D+ =0.211P = 0.026

earlylate

oncogenes:

translocation(217)

missense(40)

copy number (12)

tumorsuppressors:

all mechanisms

(84)

Cancer GeneCensusA

recurrently mutated genes(self-reported in literature)

matched sets of noncancer genes:

1517 genes (for oncogenes)

693 genes (for tumor suppressors)

complete set of 13219 noncancer genes

B

known cancer genes

in Census

others:336

39

38

C

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

# mutations per 200 kb(110 cancers, pooled tissues)

heterochromatin (H3K9me3levels in 1 MB windows)

replication timing (RepliSeqsignal in 1 MB windows)

mRNA levels, avg. of 6 tissues(log2 RPKM)

0

0.2

0.4

0.6

0.8

1D- = 0.199P = 0.043

earlylate

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D+ = 0.215P = 0.025

39 oncogenes (recurrently mutated)

38 tumor suppressors (recurr. mutated)

D

19 1821missense-activatedoncogenes

recurrently mutated(from literature)

oncogenes

0

0.2

0.4

0.6

0.8

1

0.1 0.3 0.5

D- = 0.185P = 0.061

Expected: the oncogenes and the tumor suppressors are highly enriched with missense mutations (~1.5 - 2.5x).

However, the oncogenes are also enriched with synoynmous mutations over their matched sets, ~1.2x.

Page 42: Synonymous mutations - from bacterial evolution to somatic

Introns of oncogenes (from whole-genome sequencing) are not enriched with SNVs, compared to matched sets.

The matched sets method agrees with Invex, and with simply using neighboring genes as a baseline.

Page 43: Synonymous mutations - from bacterial evolution to somatic

Tissue-specific oncogenes are more enriched with synonymous mutations in the corresponding tissue.

This effect is not due to mutation showers/clustered mutations, as the same cancer samples don't tend to contain both a synonymous and a missense mutation in same gene.

Synonymous enrichment in oncogenes is detectable across cancer types.

Page 44: Synonymous mutations - from bacterial evolution to somatic

Some oncogenes are more highly enriched with synonymous mutations than others, e.g. PDGFRA, EGFR, GATA1, ELN, NTRK1, JAK3, ALK and others (n=16).

The synonymous SNV enrichment in these genes is not paralleled by intronic SNV enrichment.

Page 45: Synonymous mutations - from bacterial evolution to somatic

The synonymous mutations tend to cluster together to a similar extent as the missense mutations in the affected oncogenes. They also (less prominently) cluster with missense mutations.

Page 46: Synonymous mutations - from bacterial evolution to somatic

0%

10%

20%

30%

40%

50%

60%

optimalcodongain

optimalcodon

loss

nochange%

of

syn

on

ymo

us

mu

tati

on

s le

adin

g to

ou

tco

me

n.s.

-18

-13

-8

-3

mR

NA

fo

ldin

g fr

ee e

ner

gy

aro

un

d m

uta

ted

sit

es (

kcal

/mo

l)

50nt windows

w.t.mRNA

mut.mRNA

-31

-26

-21

-16

-11

-6 100nt windows

w.t.mRNA

mut.mRNA

0%

10%

20%

30%

40%

≤30 nt 31-70nt

>70 nt

p < 10-4

1.75

1.26

0.45

-2

-1

0

1

2

1 2 3 4 5 6 7

log 2

RP

KM

of

exo

n

exon # in transcript ENST00000334286

30 random samples w/o point mutations

6 samples w/ synonymous exonic mutations

EDNRB gene,colorectal cancer

-0.5

-0.3

-0.1

0.1

0.3

0.5

wholecDNA

sites w/phyloP>1.0

net

# o

f ga

ined

miR

NA

see

d

site

s p

er s

yn. m

uta

tio

n

16 oncogenes

matched set

-0.3 -0.2 -0.1 0 0.1 0.2

normalized difference (Glass' delta) between properties of mutated positions in oncogenes vs. matched set

Relative preference value at C-cap (of α helices)

Normalized frequency of turn in all-α class

Alpha-helix indices for α-proteins

Relative preference value at N' (of α helices)

Relative preference value at N'' (of α helices)

Normalized frequency of α-helix in all-α class

t-testFDR<10%

0%

10%

20%

30%

enh.gain

enh.loss

sil.gain

sil.loss%

syn

. mu

tati

on

s (w

ith

in 3

0 n

t o

f sp

lice

site

) le

adin

g to

eve

nt

Ke et al. 2012 hexamers

1.53

0.83

0.60

1.90

p = 0.02

enh.gain

enh.loss

RESCUE-ESE

p = 0.003

1.90

0.53

sil.gain

sil.loss

FAS-hex2

p = 3·10-4

0.372.73

A B C

D E

G

F

0%

10%

20%

α-helix, 1st a.a.

α-helix, middle

α-helix, last a.a.

p=0.05n.s.

n.s.

1.43

1.12

0.79

0%

10%

20%

30%

40%

50%

coil

actualsynonymousmutations

randomizedmutationpositions

0%

10%

20%

middle next tocoil only

next to β-sheet

p = 4·10-5

0.97

1.01

2.60

α-helixparts:

0%

10%

20%

30%

40%

50%

coil

H

I

To do: Make nice schematicof alpha-helix as a legend here

Use of „optimal codons” miRNA binding sites Secondary structures in mRNA

What could the synonymous mutations do?

Page 47: Synonymous mutations - from bacterial evolution to somatic

0%

10%

20%

30%

40%

50%

60%

optimalcodongain

optimalcodon

loss

nochange%

of

syn

on

ymo

us

mu

tati

on

s le

adin

g to

ou

tco

me

n.s.

-18

-13

-8

-3

mR

NA

fo

ldin

g fr

ee e

ner

gy

aro

un

d m

uta

ted

sit

es (

kcal

/mo

l)

50nt windows

w.t.mRNA

mut.mRNA

-31

-26

-21

-16

-11

-6 100nt windows

w.t.mRNA

mut.mRNA

0%

10%

20%

30%

40%

≤30 nt 31-70nt

>70 nt

p < 10-4

1.75

1.26

0.45

-2

-1

0

1

2

1 2 3 4 5 6 7

log 2

RP

KM

of

exo

n

exon # in transcript ENST00000334286

30 random samples w/o point mutations

6 samples w/ synonymous exonic mutations

EDNRB gene,colorectal cancer

-0.5

-0.3

-0.1

0.1

0.3

0.5

wholecDNA

sites w/phyloP>1.0

net

# o

f ga

ined

miR

NA

see

d

site

s p

er s

yn. m

uta

tio

n

16 oncogenes

matched set

-0.3 -0.2 -0.1 0 0.1 0.2

normalized difference (Glass' delta) between properties of mutated positions in oncogenes vs. matched set

Relative preference value at C-cap (of α helices)

Normalized frequency of turn in all-α class

Alpha-helix indices for α-proteins

Relative preference value at N' (of α helices)

Relative preference value at N'' (of α helices)

Normalized frequency of α-helix in all-α class

t-testFDR<10%

0%

10%

20%

30%

enh.gain

enh.loss

sil.gain

sil.loss%

syn

. mu

tati

on

s (w

ith

in 3

0 n

t o

f sp

lice

site

) le

adin

g to

eve

nt

Ke et al. 2012 hexamers

1.53

0.83

0.60

1.90

p = 0.02

enh.gain

enh.loss

RESCUE-ESE

p = 0.003

1.90

0.53

sil.gain

sil.loss

FAS-hex2

p = 3·10-4

0.372.73

A B C

D E

G

F

0%

10%

20%

α-helix, 1st a.a.

α-helix, middle

α-helix, last a.a.

p=0.05n.s.

n.s.

1.43

1.12

0.79

0%

10%

20%

30%

40%

50%

coil

actualsynonymousmutations

randomizedmutationpositions

0%

10%

20%

middle next tocoil only

next to β-sheet

p = 4·10-5

0.97

1.01

2.60

α-helixparts:

0%

10%

20%

30%

40%

50%

coil

H

I

To do: Make nice schematicof alpha-helix as a legend here

Use of „optimal codons” miRNA binding sites Secondary structures in mRNA

No general effect was detected in any of these cases (although they may still be important in specific examples).

Page 48: Synonymous mutations - from bacterial evolution to somatic

Exonic Splicing Enhancer

~ and ~

Exonic Splicing Silencer

From Cartegni, Chew & Krainer. Nat Rev Genet. 2002 3(4),285-98.

AGAAGA enhGAAGAT enhGACGTC enhGAAGAC enh

....

CTTTTA silCTTTAA silTAGGTA silTAGTAG sil

Page 49: Synonymous mutations - from bacterial evolution to somatic

Synonymous SNVs tend to be closer to splice sites in oncogenes.

They also tend to cause gains of known exonic splicing enhancer motifs, and losses of exonic splicing silencer motifs.

Page 50: Synonymous mutations - from bacterial evolution to somatic

They more often affect exons with weaker (noncanonical) splice sites.

The exonic splicing enhancers created may resemble SF2/ASF motifs.

The ESS sites that are lost upon mutation sometimes resemble hnRNP A2/B1, H2 and A1 motifs.

Page 51: Synonymous mutations - from bacterial evolution to somatic

Roughly ½ of the putatively causal synonymous mutations alter splicing, as evidenced by examining RNA-seq data from cancer.

We don't (yet) know what the other ½ is doing. One possibility may be affecting protein folding.

Page 52: Synonymous mutations - from bacterial evolution to somatic

In yeast: Pechmann & Frydmann Nature Struct Mol Biol 2013

F

0%

10%

20%

α-helix, 1st a.a.

α-helix, middle

α-helix, last a.a.

p=0.05n.s.

n.s.

1.43

1.12

0.79

0%

10%

20%

30%

40%

50%

coil

actualsynonymousmutations

randomizedmutationpositions

0%

10%

20%

middle next tocoil only

next to β-sheet

p = 4·10-5

0.97

1.01

2.60

α-helixparts:

0%

10%

20%

30%

40%

50%

coil

G

H

N’’ N’ Ncap Ccap C’ C’’

α-helix

turn

-0.3 -0.2 -0.1 0 0.1 0.2

normalized difference (Glass' delta) between mutated sites in oncogenes vs. matched set

relative preference value at C-cap

normalized frequency of turn in all-α class

α-helix indices for α-proteins

relative preference value at N'

relative preference value at N''

normalized frequency of α-helix in all-α class

FDR<10%

...also in cancer: we observe an enrichment of synonymous mutations at N-termini of alpha-helices, esp. if close to beta-sheets.Suggestive of effects on folding.

Page 53: Synonymous mutations - from bacterial evolution to somatic

known

novel

TP53 gene has a large excess of synonymous mutations, which are always near splice sites.

We found three examples of recurrent SNV that inactivate the nearby splice site.

causes a frameshift

Page 54: Synonymous mutations - from bacterial evolution to somatic

Dosage sensitive oncogenes have many point mutations in their 3' UTRs

Page 55: Synonymous mutations - from bacterial evolution to somatic

Take-home messages:

• oncogenes contain an excess of synonymous mutations in human cancers

• a subset of synonymous mutations target splicing motifs

• 1/5 to 1/2 synonymous mutations in oncogenes reported to-date are acting as driver mutations

• ~6 – 8% of all driver mutations due to single nucleotide changes are likely to be synonymous mutations

• TP53 has recurrent synonymous mutations that disrupt splice sites

• an excess of mutations of 3’ UTRs of dosage-sensitive genes

published in: Supek et al. (2014) Cell. http://dx.doi.org/10.1016/j.cell.2014.01.051

Page 57: Synonymous mutations - from bacterial evolution to somatic

Thank you!

Fran Supek

1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona

2) Dept of Electronics, RBI, Zagreb, Croatia

XXI Jornades de Biologia Molecular

Barcelona, 11.6.2014