computational challenges in precision medicine and genomics

Post on 07-May-2015

515 Views

Category:

Science

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Genomics is mapping complex data about human biology and promises major medical advances. In particular, genomics is enabling precision medicine, the use of a patient's genome and physiological state to improve therapeutic efficacy and outcome. However, routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with "Big data". These data are so complex and large that typical researchers are not able to cope with them. Collectively, these data require an understanding of many aspects of experimental biology and medicine to correctly process and interpret. Data size is also an issue, as individual researchers may need to handle tens of terabytes (genomes from a few hundred patients), which is challenging to download and store on typical workstations. To effectively support precision medicine, scientists from a wide range of disciplines, including computer science, must develop algorithms to improve precision medicine (e.g. diagnostics and prognostics), genome interpretation, raw data processing and secure high performance computing.

TRANSCRIPT

COMPUTATIONAL CHALLENGES IN PRECISION MEDICINE AND GENOMICS

GARY BADER WWW.BADERLAB.ORG

GOOGLE WATERLOO, JUNE 9, 2014

PRECISION MEDICINE

•  TRADITIONAL MEDICINE, WITH MORE DATA •  DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS

–  BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE

•  PERSONALIZED, BUT NOT EVERYONE HAS A DIFFERENT DISEASE

NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249

NATIONAL COMPREHENSIVE CANCER NETWORK (NCCN)

Breast Cancer

Noninvasive Invasive

Lobular Carcinoma In Situ

Ductal Carcinoma In Situ Lobular Carcinoma Ductal Carcinoma Inflammatory

IMPROVING PRECISION WITH GENOMICS

•  BRCA1/BRCA2 MUTATIONS PREDICT RISK •  COMMERCIAL PROGNOSTIC TESTS BASED ON GENE

SIGNATURES

HTTP://THEBIGCANDME.BLOGSPOT.CA/

GENOMICS

•  NEW TECHNOLOGY FOR READING/WRITING DNA •  MEASURE OUR GENETIC CODE AND SYSTEM STATE

•  LOTS OF VARIABLES – WHOLE GENOME, TRANSCRIPT AND PROTEIN

EXPRESSION, SPLICING, CHROMATIN STRUCTURE, MOLECULAR INTERACTION, TRANSCRIPTION FACTOR, METHYLATION, METABOLITE, PATIENT PHENOTYPE

2  

HTTP://WWW.LHSC.ON.CA/  

SOURCE CODE ON DISK

LOAD TO ACTIVE MEMORY

COMPILER

RUNNING SOFTWARE

ACTIVE MEMORY

4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS) MEEPQSDPSVEPPLSQETFSDLWKLLPEN… GATGGGATTGGGGTTTTCCCCTCCCAT…

A PROTEIN IS A MOLECULAR MACHINE

DNA SEQUENCING

•  RECENT MASSIVE BREAKTHROUGH

•  CURRENT TECH: – ~10 HUMAN GENOMES,

1TB DATA/6 DAY RUN

ILLUMINA,  GEORGE  CHURCH  

FEB. 1, 2013: DR. LEE HOOD RECEIVES HIS NATIONAL MEDAL OF SCIENCE FROM PRESIDENT OBAMA AT WHITE HOUSE CEREMONY

MORE BREAKTHROUGHS COMING

WWW.NANOPORETECH.COM

20-NODE INSTALLATION = COMPLETE HUMAN GENOME IN 15 MINUTES

MINION = USB CONNECTION, MINIMAL SAMPLE PREPARATION, $1000 DEVICE + CONSUMABLES

WHERE DOES THE DATA COME FROM?

BARODA,  INDIA  

TORONTO,  CANADA  VERMONT,  USA  

CAMBRIDGE,  UK  

MOLECULAR  BIOLOGY  LABS  AROUND  THE  WORLD  

BGI,  >160  MACHINES    

THE  FACTORY  

COMPUTING NEEDS: 1 HUMAN GENOME

•  ~125 BASE READ LENGTH X MILLIONS •  >30X COVERAGE

•  ALIGNMENT TO REFERENCE GENOME

•  COMPUTE VARIANTS (MUTATIONS)

•  ANNOTATE VARIANTS

•  COMPUTE TIME: UP TO 2 DAYS/GENOME – OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ

•  MEDICALLY IMPORTANT TO BE FAST

THE POWER OF GENOMICS IN MEDICINE

•  7000 RARE MONOGENIC DISEASES – 50% HAVE A KNOWN GENE RESPONSIBLE – QUADRUPLED RATE OF IDENTIFICATION SINCE 2012

•  BRAIN DOPAMINE-SEROTONIN VESICULAR TRANSPORT DISEASE AND ITS TREATMENT – TWO YEARS FROM DISEASE DEFINITION TO GENE

IDENTIFICATION TO TREATMENT

NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50

NON-INVASIVE PRENATAL TEST

HTTP://WWW.PANORAMATEST.COM/

CANCER GENOMICS

•  GERM LINE VS. SOMATIC MUTATIONS •  AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER

•  >11,000 TUMOUR GENOMES, 9M MUTATIONS

HUMAN COLORECTAL CARCINOMA

HTTPS://DCC.ICGC.ORG/  

COMPUTING CHALLENGES

•  EXPONENTIAL DATA GROWTH (>MOORE’S LAW) – BILLIONS OF GENOMES – SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED,

MBS (JUST MUTATIONS)

•  HETEROGENEOUS, NOISY, COMPLEX DATA – DATA SCIENTISTS, DOMAIN EXPERTS

COMPUTING WILL TRANSFORM MEDICINE

GOAL: IMPROVE PATIENT OUTCOME

COMPUTATIONAL BIOLOGY

•  RESEARCH: USING COMPUTERS TO ANSWER BIOLOGICAL/BIOMEDICAL QUESTIONS

•  EXPLORE, INTERPRET AND DISCOVER: SEARCH •  SPEED AND ACCURACY: ALGORITHMS •  PREDICTING FUNCTIONAL MUTATIONS, PATIENT

CLASSIFICATION: MACHINE LEARNING •  PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION •  USABLE APPLICATIONS: SOFTWARE ENGINEERING

MedSavant search engine for genetic variants

WWW.MEDSAVANT.COM

Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba, Khushi Chachcha, Sergiu Dumitriu Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno

WWW.MEDSAVANT.COM

GENOMIC READS

GENOMIC VARIANTS

MARC FIUME, MIKE BRUDNO

GENOMIC READS

GENOMIC VARIANTS

MARC FIUME, MIKE BRUDNO

GLIOBLASTOMA MULTIFORME (N=215)

GOLDENBERG, BRUDNO NATURE METHODS, 2014 IDENTIFY DISEASE SUBTYPE

SURVIVAL

CLUSTERING

SPEED

DATA FUSION (NON-LINEAR, MESSAGE PASSING), UNSUPERVISED CLUSTERING

PREDICT TREATMENT RESPONSE •  SUPERVISED MACHINE LEARNING E.G. RHEUMATOID

ARTHRITIS METHOTREXATE RESPONSE

B

New

A

A

BB

B

A

Personal Medical Network

Responder

Non-Responder

NewNew patient(PredictedNon-Responder)

Weakly similar

Highly similar

Response to treatment

A

Similar e.g. SNP, smoking status

SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH  

EXPLAINING GENOMICS DATA

•  SNAPSHOTS OF SYSTEM STATE –  E.G. CANCER VS. NORMAL

•  EXPLAIN WHY STATES DIFFER –  E.G. REGULATOR PERTURBATION

– CAUSAL MODELING – PRIOR KNOWLEDGE ABOUT

MECHANISM: PATHWAYS

WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57

GENOME++ MOLECULAR, PHYSIOLOGICAL

PHENOTYPE

ENCODES EXPLAINS

ENVIRONMENT CELL MECHANISM

THE HUMAN BODY

•  A WETWARE COMPUTING SYSTEM

MODULATES

A PROTEIN IS A MOLECULAR MACHINE

1 INTERACTION (EDGE)

HO ET AL. NATURE 415(6868) 2002

LOGIC CIRCUIT (PATHWAY)

HTTP://DISCOVER.NCI.NIH.GOV/KOHNK/INTERACTION_MAPS.HTML

THE CELL

ALAIN VIEL, HARVARD UNIVERSITY, 2007

HTTP://WWW.ENDOSZKOP.COM/

~40 TRILLION CELLS, +TRILLIONS OF MICROBES (PARALLEL PROCESSING)

BIANCONI ET AL. ANN HUM BIOL. 2013 NOV-DEC;40(6):471

Microtubule

Cytoskeleton

Cell Projection

& Cell Motility

Cell Proliferation

Glycosylation

Adhesion

Regulation of GTPase

Kinase Activity/Regulation

CNS Development

Intellectual

Disability

Autism

GTPase/Ras

Signaling

Regulation of cell proliferation

Positive regulation of cell proliferation

Tyrosin kinase

Vasculature develepment

Palate develepment

Organ Morphogenesis

Behavior

Heart develepment

RHO Ras

Membrane

Kinase regulation

Cell Motility

(stricter cluster)

Centrosome

Nucleolus

Cell cycle

Regulation of

hormone levels

Aminoacid

derivative /

amine

metabolism

Synaptic vescicle maturation

Reelin pathway

LIS1 in neuronal

migration and

development

Negative

regulation

of cell cycle

cKIT

pathwaymTor

pathway

Zn finger

domain

Carboxyl

esterase

domain

Ras signaling GTPase regulator

Neuron

migration

Cell Motility

(stricter cluster)

Cell morphogenesis

Cell projection

organization

CNS

development

Brain

development

Neurite development

CNS neuron

differentiation

AxonogenesisProjection neuron

axonogenesis

Cerebral cortex

cell migration

SMC flexible hinge domain

Urea and amine group metabolism

MHC-I

Zoom of CNS-Development

ID ID

ASDASD

Both

0%

12.5%

Enrichedin deletions

FDR

Known disease genes

Enriched onlyin disease genes

Node type (gene-set)

Edge type (gene-set overlap)

From disease genesto enriched gene-sets

Between gene-setsenriched in deletions

Between sets enriched in deletions and in diseasegenes or between diseasesets only

Pinto  et  al.  FuncJonal  impact  of  global  rare  copy  number  variaJon  in  auJsm  spectrum  disorders.  Nature.  2010  Jun  9.  

Microtubule

Cytoskeleton

Cell Projection

& Cell Motility

Cell Proliferation

Glycosylation

Adhesion

Regulation of GTPase

Kinase Activity/Regulation

CNS Development

Intellectual

Disability

Autism

GTPase/Ras

Signaling

Regulation of cell proliferation

Positive regulation of cell proliferation

Tyrosin kinase

Vasculature develepment

Palate develepment

Organ Morphogenesis

Behavior

Heart develepment

RHO Ras

Membrane

Kinase regulation

Cell Motility

(stricter cluster)

Centrosome

Nucleolus

Cell cycle

Regulation of

hormone levels

Aminoacid

derivative /

amine

metabolism

Synaptic vescicle maturation

Reelin pathway

LIS1 in neuronal

migration and

development

Negative

regulation

of cell cycle

cKIT

pathwaymTor

pathway

Zn finger

domain

Carboxyl

esterase

domain

Ras signaling GTPase regulator

Neuron

migration

Cell Motility

(stricter cluster)

Cell morphogenesis

Cell projection

organization

CNS

development

Brain

development

Neurite development

CNS neuron

differentiation

AxonogenesisProjection neuron

axonogenesis

Cerebral cortex

cell migration

SMC flexible hinge domain

Urea and amine group metabolism

MHC-I

Zoom of CNS-Development

ID ID

ASDASD

Both

0%

12.5%

Enrichedin deletions

FDR

Known disease genes

Enriched onlyin disease genes

Node type (gene-set)

Edge type (gene-set overlap)

From disease genesto enriched gene-sets

Between gene-setsenriched in deletions

Between sets enriched in deletions and in diseasegenes or between diseasesets only

PATIENT #1 PATIENT #2 PATIENT #3 PATIENT #I

PATHWAYGSI CNV-AFFECTED GENE

COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0

•  IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI, THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY

•  WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS

PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n  

GS1   1   1   1   …   0   …   0  

GS2   0   0   1   …   1   …   0  

GS3   0   0   0   …   0   …   0  

DANIELE MERICO  

PATHWAY ASSOCIATION TEST

DESCRIPTION:

• THE SIGNIFICANCE OF A GENE-SET IS THEN ASSESSED USING THE FISHER’S EXACT TEST FOR ASSOCIATION

• A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN CONTROLS

• THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE ‘GENE-SET BY PATIENT’ COUNT TABLE

Case   Control  

GSi   13   1  

Not  in  GSi   1146  -­‐  13   889  -­‐  1  

PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n  

GS1   1   1   1   …   0   …   0  

GS2   0   0   1   …   1   …   0  

GS3   0   0   0   …   0   …   0  

PATHWAY ASSOCIATION TEST

BENEFITS OF SYSTEMS THINKING

•  IMPROVES STATISTICAL POWER – FEWER TESTS

•  MORE REPRODUCIBLE – E.G. GENE EXPRESSION SIGNATURES

•  EASIER TO INTERPRET – FAMILIAR CONCEPTS E.G. CELL CYCLE

•  IDENTIFIES MECHANISM – CAN EXPLAIN CAUSE

VS. PARTS THINKING

DATABASES EXPERIMENTS, PREDICTIONS

LITERATURE EXPERTS

GENOME++ MOLECULAR, PHYSIOLOGICAL

PHENOTYPE

ENCODES EXPLAINS

ENVIRONMENT CELL MECHANISM

MODULATES

HTTP://PATHWAYCOMMONS.ORG

THE FACTOID PROJECT

MAX FRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER

HELPING AUTHORS DIGITIZE THEIR PUBLISHED KNOWLEDGE

HTTP://FACTOID.BADERLAB.ORG/

NETWORK VISUALIZATION AND ANALYSIS

UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF HTTP://CYTOSCAPE.ORG

PATHWAY COMPARISON LITERATURE MINING GENE ONTOLOGY ANALYSIS ACTIVE MODULES COMPLEX DETECTION NETWORK MOTIF SEARCH

CYTOSCAPE.JS: HTML5 – TOUCH CYTOSCAPE.GITHUB.COM/CYTOSCAPE.JS/ MAX FRANZ

GENE FUNCTION PREDICTION

HTTP://WWW.GENEMANIA.ORG

QUAID MORRIS (DONNELLY) RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON, MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI, JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI

•  GUILT-BY-ASSOCIATION PRINCIPLE •  BIOLOGICAL NETWORKS ARE COMBINED INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY •  ALGORITHM IS MORE FAST AND ACCURATE THAN ITS PEERS

SOCIAL CHALLENGES

•  BIOETHICS AND DATA SHARING •  ENGAGING RESEARCHERS

– CROWDSOURCING: TCGA PAN CANCER, DREAM

•  ENCOURAGING RESEARCHERS TO EXPLORE UNCHARTED TERRITORY

•  NEED FOR QUANTITATIVE THINKING IN BIOLOGY –  NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS

DEPARTMENT AT THE UNIVERSITY OF TORONTO

NATURE. 2011 FEB 10;470(7333):163-5 WWW.NATURE.COM/TCGA/

EPENDYMOMA

•  3RD MOST COMMON BRAIN TUMOUR IN CHILDREN •  INCURABLE IN UP TO 45% OF PATIENTS

STEVE  MACK,  MICHAEL  TAYLOR,  RUTH  ISSERLIN  -­‐  CANCER  CELL.  2011  AUG  16;20(2):143-­‐57  

GENE  EXPRESSION   PATIENT  AGE   OVERALL  SURVIVAL  

EPENDYMOMA  GENOMIC  ANALYSIS  •  EPENDYMOMA  BRAIN  CANCER  -­‐  MOST  COMMON  AND  MORBID  LOCATION  

FOR  CHILDHOOD  IS  THE  POSTERIOR  FOSSA  (PF  =  BRAINSTEM  +  CEREBELLUM)  

•  TWO  SUBTYPES  BY  GENE  EXPRESSION:  PFA  -­‐  YOUNG,  DISMAL  PROGNOSIS,  PFB  -­‐  OLDER,  EXCELLENT  PROGNOSIS.  

•  WHOLE  GENOME  SEQUENCING  (47  SAMPLES)  SHOWED  ALMOST  NO  MUTATIONS,  HOWEVER  DNA  METHYLATION  ARRAYS  SHOWED  CLEAR  CLUSTERING  INTO  PFA  AND  PFB  (79  SAMPLES)  

•  PFA  MORE  TRANSCRIPTIONALLY  SILENCED  BY  CPG  METHYLATION  

STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014

POLYCOMB REPRESSOR COMPLEX 2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT

2 MONTHS 3 MONTHS 3 CYCLES VIDAZA

9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE

TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA

MICHAEL TAYLOR

ACKNOWLEDGEMENTS BADER LAB DOMAIN INTERACTION TEAM SHOBHIT JAIN BRIAN LAW JÜRI REIMAND MOHAMED HELMY ANDREA UETRECHT MARINA OLHOVSKY CANCER GENOMICS FLORENCE CAVALLI DAVID SHIH ASHA ROSTAMIANFAR PRECISION MEDICINE RON AMMAR SHIRLEY HUI

FUNDING

HTTP://BADERLAB.ORG

PATHWAY AND NETWORK ANALYSIS RUTH ISSERLIN IGOR RODCHENKOV SCOTT ZUYDERDUYN RUTH WONG VERONIQUE VOISIN SHAHEENA BASHIR KHALID ZHUBERI CHRISTIAN LOPES JASON MONTOJO MAX FRANZ HAROLD RODRIGUEZ

top related