COMPUTATIONAL CHALLENGES IN PRECISION MEDICINE AND GENOMICS
GARY BADER WWW.BADERLAB.ORG
GOOGLE WATERLOO, JUNE 9, 2014
PRECISION MEDICINE
• TRADITIONAL MEDICINE, WITH MORE DATA • DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS
– BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE
• PERSONALIZED, BUT NOT EVERYONE HAS A DIFFERENT DISEASE
NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249
NATIONAL COMPREHENSIVE CANCER NETWORK (NCCN)
Breast Cancer
Noninvasive Invasive
Lobular Carcinoma In Situ
Ductal Carcinoma In Situ Lobular Carcinoma Ductal Carcinoma Inflammatory
IMPROVING PRECISION WITH GENOMICS
• BRCA1/BRCA2 MUTATIONS PREDICT RISK • COMMERCIAL PROGNOSTIC TESTS BASED ON GENE
SIGNATURES
HTTP://THEBIGCANDME.BLOGSPOT.CA/
GENOMICS
• NEW TECHNOLOGY FOR READING/WRITING DNA • MEASURE OUR GENETIC CODE AND SYSTEM STATE
• LOTS OF VARIABLES – WHOLE GENOME, TRANSCRIPT AND PROTEIN
EXPRESSION, SPLICING, CHROMATIN STRUCTURE, MOLECULAR INTERACTION, TRANSCRIPTION FACTOR, METHYLATION, METABOLITE, PATIENT PHENOTYPE
2
HTTP://WWW.LHSC.ON.CA/
SOURCE CODE ON DISK
LOAD TO ACTIVE MEMORY
COMPILER
RUNNING SOFTWARE
ACTIVE MEMORY
4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS) MEEPQSDPSVEPPLSQETFSDLWKLLPEN… GATGGGATTGGGGTTTTCCCCTCCCAT…
A PROTEIN IS A MOLECULAR MACHINE
DNA SEQUENCING
• RECENT MASSIVE BREAKTHROUGH
• CURRENT TECH: – ~10 HUMAN GENOMES,
1TB DATA/6 DAY RUN
ILLUMINA, GEORGE CHURCH
FEB. 1, 2013: DR. LEE HOOD RECEIVES HIS NATIONAL MEDAL OF SCIENCE FROM PRESIDENT OBAMA AT WHITE HOUSE CEREMONY
MORE BREAKTHROUGHS COMING
WWW.NANOPORETECH.COM
20-NODE INSTALLATION = COMPLETE HUMAN GENOME IN 15 MINUTES
MINION = USB CONNECTION, MINIMAL SAMPLE PREPARATION, $1000 DEVICE + CONSUMABLES
WHERE DOES THE DATA COME FROM?
BARODA, INDIA
TORONTO, CANADA VERMONT, USA
CAMBRIDGE, UK
MOLECULAR BIOLOGY LABS AROUND THE WORLD
BGI, >160 MACHINES
THE FACTORY
COMPUTING NEEDS: 1 HUMAN GENOME
• ~125 BASE READ LENGTH X MILLIONS • >30X COVERAGE
• ALIGNMENT TO REFERENCE GENOME
• COMPUTE VARIANTS (MUTATIONS)
• ANNOTATE VARIANTS
• COMPUTE TIME: UP TO 2 DAYS/GENOME – OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ
• MEDICALLY IMPORTANT TO BE FAST
THE POWER OF GENOMICS IN MEDICINE
• 7000 RARE MONOGENIC DISEASES – 50% HAVE A KNOWN GENE RESPONSIBLE – QUADRUPLED RATE OF IDENTIFICATION SINCE 2012
• BRAIN DOPAMINE-SEROTONIN VESICULAR TRANSPORT DISEASE AND ITS TREATMENT – TWO YEARS FROM DISEASE DEFINITION TO GENE
IDENTIFICATION TO TREATMENT
NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50
NON-INVASIVE PRENATAL TEST
HTTP://WWW.PANORAMATEST.COM/
CANCER GENOMICS
• GERM LINE VS. SOMATIC MUTATIONS • AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER
• >11,000 TUMOUR GENOMES, 9M MUTATIONS
HUMAN COLORECTAL CARCINOMA
HTTPS://DCC.ICGC.ORG/
COMPUTING CHALLENGES
• EXPONENTIAL DATA GROWTH (>MOORE’S LAW) – BILLIONS OF GENOMES – SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED,
MBS (JUST MUTATIONS)
• HETEROGENEOUS, NOISY, COMPLEX DATA – DATA SCIENTISTS, DOMAIN EXPERTS
COMPUTING WILL TRANSFORM MEDICINE
GOAL: IMPROVE PATIENT OUTCOME
COMPUTATIONAL BIOLOGY
• RESEARCH: USING COMPUTERS TO ANSWER BIOLOGICAL/BIOMEDICAL QUESTIONS
• EXPLORE, INTERPRET AND DISCOVER: SEARCH • SPEED AND ACCURACY: ALGORITHMS • PREDICTING FUNCTIONAL MUTATIONS, PATIENT
CLASSIFICATION: MACHINE LEARNING • PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION • USABLE APPLICATIONS: SOFTWARE ENGINEERING
MedSavant search engine for genetic variants
WWW.MEDSAVANT.COM
Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba, Khushi Chachcha, Sergiu Dumitriu Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno
WWW.MEDSAVANT.COM
GENOMIC READS
GENOMIC VARIANTS
MARC FIUME, MIKE BRUDNO
GENOMIC READS
GENOMIC VARIANTS
MARC FIUME, MIKE BRUDNO
GLIOBLASTOMA MULTIFORME (N=215)
GOLDENBERG, BRUDNO NATURE METHODS, 2014 IDENTIFY DISEASE SUBTYPE
SURVIVAL
CLUSTERING
SPEED
DATA FUSION (NON-LINEAR, MESSAGE PASSING), UNSUPERVISED CLUSTERING
PREDICT TREATMENT RESPONSE • SUPERVISED MACHINE LEARNING E.G. RHEUMATOID
ARTHRITIS METHOTREXATE RESPONSE
B
New
A
A
BB
B
A
Personal Medical Network
Responder
Non-Responder
NewNew patient(PredictedNon-Responder)
Weakly similar
Highly similar
Response to treatment
A
Similar e.g. SNP, smoking status
SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH
EXPLAINING GENOMICS DATA
• SNAPSHOTS OF SYSTEM STATE – E.G. CANCER VS. NORMAL
• EXPLAIN WHY STATES DIFFER – E.G. REGULATOR PERTURBATION
– CAUSAL MODELING – PRIOR KNOWLEDGE ABOUT
MECHANISM: PATHWAYS
WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57
GENOME++ MOLECULAR, PHYSIOLOGICAL
PHENOTYPE
ENCODES EXPLAINS
ENVIRONMENT CELL MECHANISM
THE HUMAN BODY
• A WETWARE COMPUTING SYSTEM
MODULATES
A PROTEIN IS A MOLECULAR MACHINE
1 INTERACTION (EDGE)
HO ET AL. NATURE 415(6868) 2002
LOGIC CIRCUIT (PATHWAY)
HTTP://DISCOVER.NCI.NIH.GOV/KOHNK/INTERACTION_MAPS.HTML
THE CELL
ALAIN VIEL, HARVARD UNIVERSITY, 2007
HTTP://WWW.ENDOSZKOP.COM/
~40 TRILLION CELLS, +TRILLIONS OF MICROBES (PARALLEL PROCESSING)
BIANCONI ET AL. ANN HUM BIOL. 2013 NOV-DEC;40(6):471
Microtubule
Cytoskeleton
Cell Projection
& Cell Motility
Cell Proliferation
Glycosylation
Adhesion
Regulation of GTPase
Kinase Activity/Regulation
CNS Development
Intellectual
Disability
Autism
GTPase/Ras
Signaling
Regulation of cell proliferation
Positive regulation of cell proliferation
Tyrosin kinase
Vasculature develepment
Palate develepment
Organ Morphogenesis
Behavior
Heart develepment
RHO Ras
Membrane
Kinase regulation
Cell Motility
(stricter cluster)
Centrosome
Nucleolus
Cell cycle
Regulation of
hormone levels
Aminoacid
derivative /
amine
metabolism
Synaptic vescicle maturation
Reelin pathway
LIS1 in neuronal
migration and
development
Negative
regulation
of cell cycle
cKIT
pathwaymTor
pathway
Zn finger
domain
Carboxyl
esterase
domain
Ras signaling GTPase regulator
Neuron
migration
Cell Motility
(stricter cluster)
Cell morphogenesis
Cell projection
organization
CNS
development
Brain
development
Neurite development
CNS neuron
differentiation
AxonogenesisProjection neuron
axonogenesis
Cerebral cortex
cell migration
SMC flexible hinge domain
Urea and amine group metabolism
MHC-I
Zoom of CNS-Development
ID ID
ASDASD
Both
0%
12.5%
Enrichedin deletions
FDR
Known disease genes
Enriched onlyin disease genes
Node type (gene-set)
Edge type (gene-set overlap)
From disease genesto enriched gene-sets
Between gene-setsenriched in deletions
Between sets enriched in deletions and in diseasegenes or between diseasesets only
Pinto et al. FuncJonal impact of global rare copy number variaJon in auJsm spectrum disorders. Nature. 2010 Jun 9.
Microtubule
Cytoskeleton
Cell Projection
& Cell Motility
Cell Proliferation
Glycosylation
Adhesion
Regulation of GTPase
Kinase Activity/Regulation
CNS Development
Intellectual
Disability
Autism
GTPase/Ras
Signaling
Regulation of cell proliferation
Positive regulation of cell proliferation
Tyrosin kinase
Vasculature develepment
Palate develepment
Organ Morphogenesis
Behavior
Heart develepment
RHO Ras
Membrane
Kinase regulation
Cell Motility
(stricter cluster)
Centrosome
Nucleolus
Cell cycle
Regulation of
hormone levels
Aminoacid
derivative /
amine
metabolism
Synaptic vescicle maturation
Reelin pathway
LIS1 in neuronal
migration and
development
Negative
regulation
of cell cycle
cKIT
pathwaymTor
pathway
Zn finger
domain
Carboxyl
esterase
domain
Ras signaling GTPase regulator
Neuron
migration
Cell Motility
(stricter cluster)
Cell morphogenesis
Cell projection
organization
CNS
development
Brain
development
Neurite development
CNS neuron
differentiation
AxonogenesisProjection neuron
axonogenesis
Cerebral cortex
cell migration
SMC flexible hinge domain
Urea and amine group metabolism
MHC-I
Zoom of CNS-Development
ID ID
ASDASD
Both
0%
12.5%
Enrichedin deletions
FDR
Known disease genes
Enriched onlyin disease genes
Node type (gene-set)
Edge type (gene-set overlap)
From disease genesto enriched gene-sets
Between gene-setsenriched in deletions
Between sets enriched in deletions and in diseasegenes or between diseasesets only
PATIENT #1 PATIENT #2 PATIENT #3 PATIENT #I
PATHWAYGSI CNV-AFFECTED GENE
COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0
• IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI, THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY
• WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS
PaJent #1 PaJent #2 PaJent #3 … PaJent #i … PaJent #n
GS1 1 1 1 … 0 … 0
GS2 0 0 1 … 1 … 0
GS3 0 0 0 … 0 … 0
DANIELE MERICO
PATHWAY ASSOCIATION TEST
DESCRIPTION:
• THE SIGNIFICANCE OF A GENE-SET IS THEN ASSESSED USING THE FISHER’S EXACT TEST FOR ASSOCIATION
• A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN CONTROLS
• THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE ‘GENE-SET BY PATIENT’ COUNT TABLE
Case Control
GSi 13 1
Not in GSi 1146 -‐ 13 889 -‐ 1
PaJent #1 PaJent #2 PaJent #3 … PaJent #i … PaJent #n
GS1 1 1 1 … 0 … 0
GS2 0 0 1 … 1 … 0
GS3 0 0 0 … 0 … 0
PATHWAY ASSOCIATION TEST
BENEFITS OF SYSTEMS THINKING
• IMPROVES STATISTICAL POWER – FEWER TESTS
• MORE REPRODUCIBLE – E.G. GENE EXPRESSION SIGNATURES
• EASIER TO INTERPRET – FAMILIAR CONCEPTS E.G. CELL CYCLE
• IDENTIFIES MECHANISM – CAN EXPLAIN CAUSE
VS. PARTS THINKING
DATABASES EXPERIMENTS, PREDICTIONS
LITERATURE EXPERTS
GENOME++ MOLECULAR, PHYSIOLOGICAL
PHENOTYPE
ENCODES EXPLAINS
ENVIRONMENT CELL MECHANISM
MODULATES
HTTP://PATHWAYCOMMONS.ORG
THE FACTOID PROJECT
MAX FRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER
HELPING AUTHORS DIGITIZE THEIR PUBLISHED KNOWLEDGE
HTTP://FACTOID.BADERLAB.ORG/
NETWORK VISUALIZATION AND ANALYSIS
UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF HTTP://CYTOSCAPE.ORG
PATHWAY COMPARISON LITERATURE MINING GENE ONTOLOGY ANALYSIS ACTIVE MODULES COMPLEX DETECTION NETWORK MOTIF SEARCH
CYTOSCAPE.JS: HTML5 – TOUCH CYTOSCAPE.GITHUB.COM/CYTOSCAPE.JS/ MAX FRANZ
GENE FUNCTION PREDICTION
HTTP://WWW.GENEMANIA.ORG
QUAID MORRIS (DONNELLY) RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON, MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI, JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI
• GUILT-BY-ASSOCIATION PRINCIPLE • BIOLOGICAL NETWORKS ARE COMBINED INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY • ALGORITHM IS MORE FAST AND ACCURATE THAN ITS PEERS
SOCIAL CHALLENGES
• BIOETHICS AND DATA SHARING • ENGAGING RESEARCHERS
– CROWDSOURCING: TCGA PAN CANCER, DREAM
• ENCOURAGING RESEARCHERS TO EXPLORE UNCHARTED TERRITORY
• NEED FOR QUANTITATIVE THINKING IN BIOLOGY – NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS
DEPARTMENT AT THE UNIVERSITY OF TORONTO
NATURE. 2011 FEB 10;470(7333):163-5 WWW.NATURE.COM/TCGA/
EPENDYMOMA
• 3RD MOST COMMON BRAIN TUMOUR IN CHILDREN • INCURABLE IN UP TO 45% OF PATIENTS
STEVE MACK, MICHAEL TAYLOR, RUTH ISSERLIN -‐ CANCER CELL. 2011 AUG 16;20(2):143-‐57
GENE EXPRESSION PATIENT AGE OVERALL SURVIVAL
EPENDYMOMA GENOMIC ANALYSIS • EPENDYMOMA BRAIN CANCER -‐ MOST COMMON AND MORBID LOCATION
FOR CHILDHOOD IS THE POSTERIOR FOSSA (PF = BRAINSTEM + CEREBELLUM)
• TWO SUBTYPES BY GENE EXPRESSION: PFA -‐ YOUNG, DISMAL PROGNOSIS, PFB -‐ OLDER, EXCELLENT PROGNOSIS.
• WHOLE GENOME SEQUENCING (47 SAMPLES) SHOWED ALMOST NO MUTATIONS, HOWEVER DNA METHYLATION ARRAYS SHOWED CLEAR CLUSTERING INTO PFA AND PFB (79 SAMPLES)
• PFA MORE TRANSCRIPTIONALLY SILENCED BY CPG METHYLATION
STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014
POLYCOMB REPRESSOR COMPLEX 2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT
2 MONTHS 3 MONTHS 3 CYCLES VIDAZA
9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE
TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA
MICHAEL TAYLOR
ACKNOWLEDGEMENTS BADER LAB DOMAIN INTERACTION TEAM SHOBHIT JAIN BRIAN LAW JÜRI REIMAND MOHAMED HELMY ANDREA UETRECHT MARINA OLHOVSKY CANCER GENOMICS FLORENCE CAVALLI DAVID SHIH ASHA ROSTAMIANFAR PRECISION MEDICINE RON AMMAR SHIRLEY HUI
FUNDING
HTTP://BADERLAB.ORG
PATHWAY AND NETWORK ANALYSIS RUTH ISSERLIN IGOR RODCHENKOV SCOTT ZUYDERDUYN RUTH WONG VERONIQUE VOISIN SHAHEENA BASHIR KHALID ZHUBERI CHRISTIAN LOPES JASON MONTOJO MAX FRANZ HAROLD RODRIGUEZ