gosia komor - pacb.com · • proteogenomic pipeline combining rna-seq and lc-ms/ms data for...

49
Proteogenomic analysis of alternative splicing: the search for novel biomarkers for colorectal cancer Gosia Komor

Upload: others

Post on 10-Sep-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Proteogenomic analysis of alternative splicing: the search for novel biomarkers for colorectal cancer

Gosia Komor

Page 2: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Collect clinical samples • (Tumor) tissue

• Blood

• Stool

Collect clinical information

Study tumor biology • Preclinical models

Translational Gastrointestinal Oncology

Perform molecular profiling • DNA

• RNA

• Protein

Page 3: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Collect clinical samples • (Tumor) tissue

• Blood

• Stool

Collect clinical information

Study tumor biology • Preclinical models

Translational Gastrointestinal Oncology

Perform molecular profiling • DNA

• RNA

• Protein Molecular profiling of colon tumors

Translation of molecular knowledge into clinical tests

Page 4: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Colorectal cancer (CRC)

Colorectal cancer is the 2nd most common cancer type in the Netherlands - Incidence rate of over 15 000 patients per year - Most patients between 60-79 years old

Colorectal tumor progression

Figure adapted from Nature Reviews Cancer 9, 489–499 (2009)

20 - 40 years

Page 5: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Colorectal cancer has a high cure rate when diagnosed early

94% 82%

67%

11% 0%

20%

40%

60%

80%

100%

I II III IV

5-year survival rate by stage

Page 6: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

FIT performance * • Specificity: ~95% • Sensitivity CRC : ~79% • Sensitivity precursor lesions (advanced adenomas): ~27%

FIT+

Fecal immunochemical test (FIT) Colonoscopy

Population wide screening for colorectal cancer implemented in the Netherlands

* Lee et al, Accuracy of Fecal Immunochemical Tests for Colorectal Cancer, Annals of Internal Medicine, 2014

Page 7: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

FIT performance * • Specificity: ~95% • Sensitivity CRC : ~79% • Sensitivity precursor lesions (advanced adenomas): ~27%

FIT+

Fecal immunochemical test (FIT) Colonoscopy

Population wide screening for colorectal cancer implemented in the Netherlands

* Lee et al, Accuracy of Fecal Immunochemical Tests for Colorectal Cancer, Annals of Internal Medicine, 2014

Clinical need for novel biomarkers

Page 8: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Tumor-specific molecular changes accompany tumor progression DNA alterations, e.g.:

• Mutations (SNVs) • Copy number aberrations • Methylation

RNA alterations, e.g.: • RNA Splicing

AIM: Identify novel biomarkers for CRC screening

Page 9: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Tumor-specific molecular changes accompany tumor progression DNA alterations, e.g.:

• Mutations (SNVs) • Copy number aberrations • Methylation

RNA alterations, e.g.: • RNA Splicing

AIM: Identify novel biomarkers for CRC screening

Figure adapted from Sveen et al. Oncogene 2016;35(19):2413-27

Page 10: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Tumor-specific molecular changes accompany tumor progression DNA alterations, e.g.:

• Mutations (SNVs) • Copy number aberrations • Methylation

RNA alterations, e.g.: • RNA Splicing

AIM: Identify novel biomarkers for CRC screening

Figure adapted from Sveen et al. Oncogene 2016;35(19):2413-27

BCL2L1: - Bcl-xL – anti-apoptotic - Bcl-xS – pro-apoptotic

VEGFA: - pro-angiogenic - anti-angiogenic

Page 11: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

AIM: Identify novel biomarkers for CRC screening

Tumor-specific protein isoforms could complement or outperform

hemoglobin in CRC screening

pre-mRNA

alternatively spliced mRNA

protein isoforms

splicing

translation

Page 12: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

AIM: Identify novel biomarkers for CRC screening

Tumor-specific protein isoforms could complement or outperform

hemoglobin in CRC screening

pre-mRNA

alternatively spliced mRNA

protein isoforms

Design an approach to identify tumor specific protein variants

splicing

translation

Page 13: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

In s

ilico

Ex

pe

rim

en

tal

With the use of available protein sequence databases ~50% of mass spectra are still not identified

Figure adapted from Duncan et al. Nat Biotechnol. 2010;28:659–664.

Page 14: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

In s

ilico

Ex

pe

rim

en

tal

With the use of available protein sequence databases ~50% of mass spectra are still not identified

RNA-seq Figure adapted from Duncan et al. Nat Biotechnol. 2010;28:659–664.

Page 15: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Experimental design Down-modulation of splicing machinery to investigate differential splicing

in a controlled setting

CRC cell lines SW480

siSRSF1 siSF3B1

siNonTargeting (siNT)

proteomics LC-MS/MS QExactive

RNA-seq Illumina HiSeq

2x125bp

Page 16: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Experimental design Down-modulation of splicing machinery to investigate differential splicing

in a controlled setting

CRC cell lines SW480

siSRSF1 siSF3B1

siNonTargeting (siNT)

proteomics LC-MS/MS QExactive

RNA-seq Illumina HiSeq

2x125bp

PacBio Iso-Seq RSII, 4 fractions

0 - 50kb

Page 17: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Experimental design Down-modulation of splicing machinery to investigate differential splicing

in a controlled setting

CRC cell lines SW480

siSRSF1 siSF3B1

siNonTargeting (siNT)

proteomics LC-MS/MS QExactive

RNA-seq Illumina HiSeq

2x125bp

PacBio Iso-Seq RSII, 4 fractions

0 - 50kb

Proteogenomic pipeline

Page 18: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Short RNA-seq reads

Mass spectra

SPLICIFY – proteogenomic pipeline for differential splice variant identification

Page 19: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Reference annotation Short RNA-seq

reads

Reads mapping (STAR)

Differential splicing analysis (rMATS)

Differential splice variants on RNA level

Mass spectra

Quality and adapter

trimming (Trimmomatic)

SPLICIFY – proteogenomic pipeline for differential splice variant identification

Page 20: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Reference annotation Short RNA-seq

reads

Reads mapping (STAR)

Differential splicing analysis (rMATS)

Differential splice variants on RNA level

3-frame translation Potential

protein variants Human protein

database

Mass spectra

Quality and adapter

trimming (Trimmomatic)

Enriched protein database

SPLICIFY – proteogenomic pipeline for differential splice variant identification

Page 21: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Reference annotation Short RNA-seq

reads

Reads mapping (STAR)

Differential splicing analysis (rMATS)

Differential splice variants on RNA level

3-frame translation Potential

protein variants Human protein

database

Mass spectra

Identify MS/MS

Extract variant peptides

Differential peptide

expression (limma)

Quality and adapter

trimming (Trimmomatic)

Enriched protein database

Differential protein isoforms

SPLICIFY – proteogenomic pipeline for differential splice variant identification

Page 22: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Reference annotation Short RNA-seq

reads

Reads mapping (STAR)

Differential splicing analysis (rMATS)

Differential splice variants on RNA level

3-frame translation Potential

protein variants Human protein

database

Mass spectra

Identify MS/MS

Extract variant peptides

Differential peptide

expression (limma)

Quality and adapter

trimming (Trimmomatic)

Enriched protein database

Differential protein isoforms

SPLICIFY – proteogenomic pipeline for differential splice variant identification

PacBio full-length transcripts

Page 23: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Differential splice variants identified on RNA level

Exon skipping:

exclusion inclusion

- exclusion spanning reads - inclusion spanning reads

Page 24: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Differential splice variants identified on RNA level

Skipped exon (SE)

Alternatively 3’ splice site (A3SS)

Alternatively 5’ splice site (A5SS)

Mutually exclusive exons (MXE)

Retained intron (RI)

Alternatively splicing events

Exon skipping:

exclusion inclusion

- exclusion spanning reads - inclusion spanning reads

Page 25: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RT-qPCR validation of SPLICIFY results on RNA level

RNA-seq

Page 26: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RT-qPCR validation of SPLICIFY results on RNA level

0.0

0.2

0.4

0.6

0.8

1.0

1.2

siNT siSF3B1

Re

lati

ve e

xpre

ssio

n

RT-qPCR

OSBPL3 inclusion

OSBPL3 exclusion

RNA-seq

Page 27: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RNA to protein translation : isoform-specific peptides

exclusion inclusion

Exon skipping

– exon

Page 28: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RNA to protein translation : isoform-specific peptides

exclusion inclusion

XXXXXZZZZZ

XXXXX------------------------ZZZZZ

Exon skipping

XXXXXZZZZZ – exclusion specific split peptide

– exon

Page 29: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RNA to protein translation : isoform-specific peptides

exclusion inclusion

XXXXXZZZZZ

XXXXX------------------------ZZZZZ

Exon skipping

XXXXXZZZZZ – exclusion specific split peptide

XXXXXYYYY – inclusion specific split peptide

– exon

XXXXX------YYYY

XXXXXYYYY

Page 30: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RNA to protein translation : isoform-specific peptides

exclusion inclusion

XXXXXZZZZZ

XXXXX------------------------ZZZZZ

Exon skipping

XXXXXZZZZZ – exclusion specific split peptide

XXXXXYYYY – inclusion specific split peptide

YYVV – inclusion specific peptide on target

– exon

XXXXX------YYYY

XXXXXYYYY

YYVV

YYVV

Page 31: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

RNA to protein translation : isoform-specific peptides

exclusion inclusion

XXXXXZZZZZ

XXXXX------------------------ZZZZZ

Exon skipping

XXXXXZZZZZ – exclusion specific split peptide

XXXXXYYYY – inclusion specific split peptide

YYVV – inclusion specific peptide on target

– exon

XXXXX------YYYY

XXXXXYYYY

YYVV

YYVV

exclusion inclusion

XXXXXAAAA

XXXXXAAAA XXXXXZZZZZ

XXXXX--------------ZZZZZ

Retained intron

AABB

AABB

XXXXXZZZZZ – exclusion specific split peptide

XXXXXAAAA – inclusion specific spanning peptide

AABB – inclusion specific peptide on target

– intron

Page 32: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Differential isoform identified on protein level

Experiment

On

target

Spanning

peptide

Split

peptide

siSF3B1 vs siNT 3278 9 1794

siSRSF1 vs siNT 217 3 154

Page 33: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Differential isoform identified on protein level

Experiment

On

target

Spanning

peptide

Split

peptide

siSF3B1 vs siNT 3278 9 1794

siSRSF1 vs siNT 217 3 154

RNA Protein translation

Page 34: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Quantitative differences on RNA and protein level

RefSeq Genes RNA isoforms

Page 35: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Quantitative differences on RNA and protein level

RNA-seq

RefSeq Genes RNA isoforms

Page 36: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Quantitative differences on RNA and protein level

RNA-seq

RefSeq Genes RNA isoforms Isoform-specific peptides

Page 37: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Quantitative differences on RNA and protein level

RNA-seq

RefSeq Genes RNA isoforms Isoform-specific peptides

LC-MS/MS translation

inclusion exclusion

1 2 3

1 2

3

Page 38: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

PacBio full length transcripts used as annotation to identify novel events siSF3B1 vs siNT

SPLICIFY with reference annotation SPLICIFY with PacBio full-length transcripts

Full length transcript used as annotation to quantify Illumina reads • Comparison to the standard SPLICIFY with reference annotation • Both approaches include Illumina reads for the differential analysis

Page 39: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

PacBio Iso-Seq provides a number of novel alternatively spliced events differential splicing analysis of siSF3B1 vs siNT1

Skipped exon (SE)

Alternatively 3’ splice site (A3SS)

Alternatively 5’ splice site (A5SS)

Mutually exclusive exons (MXE)

Retained intron (RI)

Alternatively splicing events

Page 40: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

PacBio Iso-Seq provides a number of novel alternatively spliced events differential splicing analysis of siSF3B1 vs siNT1

Skipped exon (SE)

Alternatively 3’ splice site (A3SS)

Alternatively 5’ splice site (A5SS)

Mutually exclusive exons (MXE)

Retained intron (RI)

Alternatively splicing events

Page 41: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

PacBio Iso-Seq provides a number of novel alternatively spliced events differential splicing analysis of siSF3B1 vs siNT1

Skipped exon (SE)

Alternatively 3’ splice site (A3SS)

Alternatively 5’ splice site (A5SS)

Mutually exclusive exons (MXE)

Retained intron (RI)

Alternatively splicing events

Page 42: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

PacBio Iso-Seq provides a number of novel alternatively spliced events differential splicing analysis of siSF3B1 vs siNT1

Skipped exon (SE)

Alternatively 3’ splice site (A3SS)

Alternatively 5’ splice site (A5SS)

Mutually exclusive exons (MXE)

Retained intron (RI)

Alternatively splicing events

Page 43: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Novel isoforms identified with PacBio Iso-seq are expressed on protein level

On target Spanning peptides

Split peptides

PacBio transcripts 3090 23 2350

Reference Annotation 2518 6 1964

Isoform-specific peptides

Page 44: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Read coverage in siSF3B1

PacBio transcripts

RNA-seq isoforms

Isoform-specific peptides

RefSeq Genes

Read coverage in siNT

Novel isoforms identified with PacBio Iso-seq are expressed on protein level Retained intron

Page 45: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Novel isoforms identified with PacBio Iso-seq are expressed on protein level Alternative 3’ splice site

Read coverage in siSF3B1

PacBio transcripts

RNA-seq isoforms

Isoform-specific peptides

RefSeq Genes

Read coverage in siNT

Page 46: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Conclusions • Established SPLICIFY

• Proteogenomic pipeline combining RNA-seq and LC-MS/MS data

for differential splice variant identification

• Confirmation of the splice variants on RNA level

• RT-qPCR

• by PacBio full-length transcripts

• https://github.com/NKI-TGO/SPLICIFY

• will be available soon

• Novel splicing events identified with PacBio Iso-seq

• confirmation on the protein level

Page 47: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Organoids Human tissues

• Healthy colon tissue • adenomas • CRCs

protein isolation LC-MS/MS

mRNA isolation Illumina RNA-seq

SPLICIFY proteogenomic pipeline

Future plans

Page 48: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Organoids Human tissues

• Healthy colon tissue • adenomas • CRCs

protein isolation LC-MS/MS

mRNA isolation Illumina RNA-seq

SPLICIFY proteogenomic pipeline

Antibody-based assay for the best candidates

Stool samples

FIT samples

Future plans

Page 49: Gosia Komor - pacb.com · • Proteogenomic pipeline combining RNA-seq and LC-MS/MS data for differential splice variant identification • Confirmation of the splice variants on

Department of Pathology Translational Gastrointestinal Oncology Annemieke Hiemstra Anne Bolijn Marianne Tijssen Pien Delis-van Diemen Meike de Wit Beatriz Carvalho Remond JA Fijneman Gerrit A Meijer

Medical Oncology Oncoproteomics Laboratory Tim Schelfhorst Sander Piersma Thang Pham Connie R Jimenez

Acknowledgements

This research was financially supported by a grant from the Dutch Cancer Society

Grant number: NKI 2013-6025

Pacific Biosciences Bo Han Elisabeth Tseng Sarah Kingan Meredith Ashby

Icahn School of Medicine at Mount Sinai Robert P Sebra