, ph.d. department of genome medicine and science...
TRANSCRIPT
Sungwon Jung, Ph.D.Department of Genome Medicine and ScienceGachon University School of Medicine
The 15th Korea-Japan-China Bioinformatics Symposium 2017Jun 21, 2017
u Understanding biological contexts from pathway-centric perspectives
u Pathway activity quantification based on differential genetic associations§ Evaluation of Differential DependencY (EDDY)§ Knowledge-based EDDY (KEDDY)
u Extending pathway activity quantification to single sample cases
u Individualized pathway signaling quantification and its correlation with drug responses
2
u Typical steps of gene-centric approaches (with an example of utilizing gene expression data)§ I. Prepare/preprocess gene expression data§ II. Testing each gene for differential expression between conditions
▪ Necessary adjustments (ex. P-value correction for multiple testing)§ III. Identifying genes with differential expressions§ IV. Based on annotations, investigating enriched annotations within the
identified list of genes
3
11q13 4p16
CENPC1, SNORD1A, ADAP1,ICAM4, GPANK1, ZNF43, …
Gene Ontology
MSigDB
Ingenuity IPAMetacore
Enriched biological functionsor pathways in 11q13
ENO1P1, ZNF469, TCEB1P10,ITPKC, CLDN5, TRAJ61, …
Gene Ontology
MSigDB
Ingenuity IPAMetacore
Enriched biological functionsor pathways in 4p16
u No clear interpretation of genetic regulations§ “Okay, we identified a list of genes, which may be related to biological
functions XX, YY, ZZ …”
u Motivation: Can we provide more information on “differential biological functions”?§ More direct interpretation can be possible.
4
u Identifying differential pathways§ GSEA (Subramanian et al., PNAS 2005 / Mootha, Nature Genetics,
2003)
5 Based on (sort of) differential gene expressions
u Differential expression-based approaches assume similar expression levels within a context
u However, there can be biological contexts where genes do not show similar expressions within.§ Potentially due to other hidden factors
6
A
Phenotype I Phenotype II Up
Down A
Gene A
Gene B
Activation
miRNA C
Post-transcriptionalinhibition
Gene A
Gene B
Gene A
Gene B
miRNA C
Working genetic interaction Silenced genetic interaction
AB
ABBiological contexts
without similar gene expression
7
V1
V2 V3
Target set of variables V
V1
V2
V3
Condition C1
Condition C2
… Possible dependency network structures
Probability density
Computing divergence
Freq
uenc
y
Divergence from permutations
Original divergence
p-value
Evaluating statistical significance by randomly permuting condition labels
Data of two conditions
V1
V2
V3
Condition C1
Condition C2
…
Permutation 1
Divergence
…
Permutation 2
…
Permutation 3
…
Permutation 4
… (Jung and Kim, Nucleic Acids Research 2014)
u TCGA GBM gene expression data§ Four subtypes (Classical, Mesenchymal, Neural, and Proneural)
u Identifying subtype-specific pathways with differential genetic associations§ Testing of one subtype vs. the others§ 2,101 pathway gene sets were tested (from MSigDB)
8
Detectsdifferentiallyexpressedgenes
Anothermethodconsideringdifferentialassociations
(Jung and Kim, Nucleic Acids Research 2014)
9
C Apoptosis
Cell cycle
DNA damage repair
Immune system
Signal transduction
A Cell cycle
Immune system
Signal transduction
B Brain function
Immune system
Cell migration & structure
Signal transduction
D
Apoptosis
Cell cycle
Cell structure
Immune system
Inflammatory response
Signal transduction
Stimulus response
(Jung and Kim, Nucleic Acids Research 2014)
u One example: ARF pathway from the Classical subtype
10
CDKN2A
RB1
Non-Classical
Classical
Both
A
B
-3
3
0
Classical
POLR1D
ABL1 POLR1C
MYC
POLR1A
TP53
TBX2
E2F1 MDM2
POLR1B
PIK3CG
RAC1
PIK3R1
TWIST1
(A) A lot of interaction differencesbetween Classical and non-Classicalsamples
(B) No clear differential expression ofindividual genes between Classical andnon-Classical
§ From the Classical subtype, there isfocal 9p21.3 homozygous deletiontargeting CDKN2A. (Verhaak et al.,Cancer Cell 2010)
§ RB pathway is almost exclusivelyaffected through CDKN2A deletion.
§ (A) CDKN2A lost many interactionsin Classical, while the only activatedrelationship is with RB1.
(Jung and Kim, Nucleic Acids Research 2014)
u Requiring large computational resources§ For higher sensitivity, evaluation for more gene association networks is
necessary.§ For larger gene sets (pathways), evaluation of more gene association
networks is necessary for comparable sensitivity.
§ => Needs heavy computation
§ Example)▪ Testing about 1,500 pathways for four classes with 202 samples: Took two
weeks with about 1,000 HPC cores
§ ’Oh, I need something more handy.’
11
12
�����
�������
V1
V2 V3
Genes of the target pathway Onesample
V1
V2
V3
…
Computetheprobabilitiesofpathwayactivitystatus
P2P3
S1 S2 S3 …
…PathwayActivityProfile
P1
(Jung, BMC Medical Informatics and Decision Making 2016)
13
�����
�������
Gene x Sample matrix
…
Pathway x Sample matrix
Computing activity distributions of ALL PATHWAYS, for ALL
SAMPLES
Identifying group of samplesbased on sample pathway activity vectors
Clustering
(Jung, BMC Medical Informatics and Decision Making 2016)
u TCGA melanoma RNA-seq data
14
23118108 11 45130 52 30 34 58119184163165212155201 15186147 25198 89138179192121238 54267 19160 44 62169224193 22125235181226157260127171228244197148 3259 42 72200142266 61154150 64206 57 17252 51124175144128143264149152219 6183202104 12 16207 78 82 83113 66 65 26187 2 88 55262237 4 74239253111209234 46106107164188265232 95248 59133115167230 86 73103249 8136132 39 97233225 9131258110122217 18102223 43120222112 50 96146 98 63178229242134194227170 70191 47168153195250218172156 20 92 53 99135 5 38 75109161210205185139254140 35105 36204166158 94141255151216221 24189190231243 68129 69159 1 49 29 85114 87247241236211 41213 71116 40 32 79 10245 48100137 28 76182 77 56203256240 27 91 93101126208 21162263196 14117246145176 33 13 7 60 67220 84 90251180199 37 81 80257261 31123174177173214215
50
100
150
200
250
300
350
Group I Group II
Number of samples: 90Enriched stage: Stage IAEnriched miRNAs: 32Enriched mutations: 1,017
Number of samples: 85Enriched stage: Stage IICEnriched miRNAs: 27Enriched mutations: 1,349
(Jung, BMC Medical Informatics and Decision Making 2016)
15
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
Kaplan−Meier estimate of survival functions
Estim
ated
sur
viva
l fun
ctio
ns
Time
ContextOthersCensored
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
Kaplan−Meier estimate of survival functions
Estim
ated
sur
viva
l fun
ctio
ns
Time
ContextOthersCensored
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
Kaplan−Meier estimate of survival functions
Estim
ated
sur
viva
l fun
ctio
ns
Time
ContextOthersCensored
Group I
The rest
Days of survival
(A)
Group II
The rest
Days of survival
(B)
Group I
Group II
Days of survival
(C)
p-value = 0.0044 p-value = 0.03294 p-value = 0.00403
GroupImediansurvival:12.6yearsGroupIImediansurvival:4.8years
HazardratioofGroupIrespecttoGroupII:0.328
(Jung, BMC Medical Informatics and Decision Making 2016)
u One example: DNA fragmentation pathway§ One of the mechanisms used by immune cells to kill target cells
16
0.104%
0.106%
0.108%
0.11%
0.112%
0.114%
0.116%
Normal% Net1% Net2% Net3% Net4% Net5% Net6% Net7% Net8%
Group%I%
Group%II%
SilencedGZMB – CASP3
SilencedGZMB – CASP7
Average pathway activities of two groups • GZMB: Expressed by cytotoxic T lymphocytes and natural killer cells. Activates caspasecascade.
• CASP3, CASP7: Caspases. Activation plays a central role in cell apoptosis.
Suggests restricted immune response in Group II, where the caspase cascade activation by GZMB has been silenced.
(Jung, BMC Medical Informatics and Decision Making 2016)
u Genomic data of colorectal cancer§ 26 samples including primary tumor and liver metastasis
▪ With matched normal colon tissues§ Whole exome-seq, RNA-seq§ DNA somatic mutations and gene expressions were evaluated.
u High-throughput drug screening§ Screening of 65 FDA-approved drugs for every tumor sample
▪ Response: IC50
17
Colorectal cancer- Matched normal tissue- Primary tumor- Liver metastasis
Tumor PDX
High-throughputscreening
Genomic data(WES, RNA-seq)
Collaboration with Prof.Yongbeom Cho at SamsungMedical Center in Korea
u The needs of pathway evaluation methods, with abilities of investigating individual genetic associations
u Wide range of applications for individualized pathway analysis§ Novel disease subtype identification§ Individualized drug response prediction
18
u Jung lab§ Sora Kim, M.S.§ Hyojung Kim§ Joigmin Lee§ Opening for Post-Doc!
u Samsung Medical Center§ Prof. Yongbeom Cho§ Hyekyung Hong, Ph.D.§ Yeosong Lee, Ph.D.
19
u Gachon Institute of Genome Medicine and Science