inferring nonstationary gene networks from temporal gene expression data

24
1 Harvard Medical School Massachusetts Institute of Technology Inferring Nonstationary Gene Networks from Temporal Gene Expression Data Hsun-Hsien Chang 1 , Jonathan J. Smith 2 , Marco F. Ramoni 1 1 Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2 Department of Mathematics, Massachusetts Institute of Technology IEEE Workshop on Signal Processing Systems

Upload: hoai

Post on 11-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Inferring Nonstationary Gene Networks from Temporal Gene Expression Data. Hsun-Hsien Chang 1 , Jonathan J. Smith 2 , Marco F. Ramoni 1 1 Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2 Department of Mathematics, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

1

Harvard Medical School Massachusetts Institute of Technology

Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

Hsun-Hsien Chang1, Jonathan J. Smith2, Marco F. Ramoni1

1Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2Department of Mathematics, Massachusetts Institute of Technology

IEEE Workshop on Signal Processing SystemsOctober 7, 2010

Page 2: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

2

Harvard Medical School Massachusetts Institute of Technology

Background

• Genetic information flows from DNA to RNA through transcription.

• Modern microarray technologies are able to assess expression of 50K genes in parallel.

• Gene expression is the measure of RNA abundance in cells, revealing the gene activities.

Page 3: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

3

Harvard Medical School Massachusetts Institute of Technology

Clinical Applications

• Thanks to cost down, more samples can be collected in a single study. A new clinical application:– Monitor time-series gene expression in response to drugs,

treatments, vaccines, virus infection, etc.

T0

...gene

expre.

T1 T2 T3 T4 T5

Multiple patients in distinct biological

conditions.

Page 4: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

4

Harvard Medical School Massachusetts Institute of Technology

Time-Series Gene Expression Analysis• Since genes interact each other in cells, an intriguing

analysis is to infer gene networks:– Detailed models (e.g., differential equations).

– Abstract models (e.g., Boolean networks).

– Probabilistic graphical models (e.g., dynamic Bayesian networks).

• Do not require densely sampled data. • Model expression levels by random variables to

handle noisy expression measurements and biological variability.

• Utilize the inferred networks to make prediction.

gene on gene off

Page 5: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

5

Harvard Medical School Massachusetts Institute of Technology

Data Representation by Bayesian Networks• Bayesian networks are directed acyclic graphs where:

– The network model can serve as a prediction tool.

XT

YTZT+1

givenXT

YT

predictedZT+1

– Example: variables X and Y at time T modulate variable Z at time T+1.

• Dynamic Bayesian networks with arcs indicating temporal dependency.

– Nodes correspond to random variables (i.e., expressions of genes, clinical variables).

– Directed arcs encode conditional probabilities of the target (child) nodes on the source (parent) nodes.

A

BC

ED

Page 6: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

6

Harvard Medical School Massachusetts Institute of Technology

Network Inference Engine

AT

BT

CT

NT

VT

AT+1

BT+1

CT+1

NT+1

VT+1• First-order Markov process:

data at time T+1 depends only on the preceding time T.

• For a variable at a time T+1, search which set of variables at time T has the highest likelihood of modulating its value at T+1.

• Step-wise search algorithm.

Clinical variable

Genes

Page 7: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

7

Harvard Medical School Massachusetts Institute of Technology

Inference of Whole Dynamic Gene Network

AT

BT

CT

NT

VT

AT+1

BT+1

CT+1

NT+1

VT+1

AT+2

BT+2

CT+2

NT+2

VT+2

• Infer a transition network between every pair of times.

Page 8: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

8

Harvard Medical School Massachusetts Institute of Technology

Parallelize Learning Individual Transition Nets

AT+1

BT+1

CT+1

NT+1

VT+1

AT+2

BT+2

CT+2

NT+2

VT+2

AT

BT

CT

NT

VT

AT+1

BT+1

CT+1

NT+1

VT+1

AT+2

BT+2

CT+2

NT+2

VT+2

Page 9: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

9

Harvard Medical School Massachusetts Institute of Technology

Parallelize Parent Searching of Individual Variables

AT

BT

CT

NT

VT

AT+1

BT+1

CT+1

NT+1

VT+1

Page 10: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

10

Harvard Medical School Massachusetts Institute of Technology

Step-by-Step Prediction

AT

BT

CT

NT

VT

AT+1

BT+1

CT+1

NT+1

VT+1

AT+2

BT+2

CT+2

NT+2

VT+2

AT

BT

CT

NT

VT

AT+2

BT+2

CT+2

NT+2

VT+2

AT+1

BT+1

CT+1

NT+1

VT+1

AT+1

BT+1

CT+1

NT+1

VT+1

given data

predicted predictedgiven data

Page 11: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

11

Harvard Medical School Massachusetts Institute of Technology

Forecasting by Initial Data

AT

BT

CT

NT

VT

AT+1

BT+1

CT+1

NT+1

VT+1

AT+2

BT+2

CT+2

NT+2

VT+2

AT

BT

CT

NT

VT

AT+2

BT+2

CT+2

NT+2

VT+2

AT+1

BT+1

CT+1

NT+1

VT+1

given data

predictedpredicted

Page 12: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

12

Harvard Medical School Massachusetts Institute of Technology

Clinical Study: HIV Viral Load Tracking• Global AIDS epidemic is one of the greatest threats to

human health, causing 2 million deaths every year.• Viral load (i.e., virus density in blood) is:

– associated with clinical outcomes. – an indicator of which treatment physicians should provide.

• If there is a tool to predict/forecast viral load trajectory, physicians could foresee how patients progress to AIDS and could allocate the best treatments upfront.

Enroll 1 2 4 12 24

viral load

...gene expre.

• Data: Fourteen (12 Africans, 2 Americans) untreated adult patients during acute infection.

Page 13: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

13

Harvard Medical School Massachusetts Institute of Technology

Dynamic Gene Network of HIV Viral Load

Page 14: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

14

Harvard Medical School Massachusetts Institute of Technology

Page 15: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

15

Harvard Medical School Massachusetts Institute of Technology

Accuracy of HIV Viral Load Tracking

Fitted Validation (Accuracy)

Cross Validation (Robustness)

Dynamic Gene Network 97.8% 95.8%Viral Load Auto-Regression 90.1% 89.5%

• Prediction accuracy:

• Forecasting accuracy:Fitted Validation

(Accuracy)Cross Validation

(Robustness)Dynamic Gene Network 92.9% 91.8%

Viral Load Auto-Regression 88.7% 87.0%

Page 16: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

16

Harvard Medical School Massachusetts Institute of Technology

30 Genes Dynamically Interact with Viral LoadAMY1A: amylase, alpha 1a; salivary OTOF: otoferlin

TNFAIP6 : tumor necrosis factor, alpha-induced protein 6

KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3

NBPF14: neuroblastoma breakpoint family, member 14 OSBP2: oxysterol binding protein 2

IRF7: interferon regulatory factor 7 CFD: complement factor d (adipsin)

HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1

HLA-DRB1: major histocompatibility complex, class ii, dr beta 1

RPS23: ribosomal protein s23 GPR56: g protein-coupled receptor 56

IFI44L: interferon-induced protein 44-like CCL23: chemokine (c-c motif) ligand 23

KLRC2: killer cell lectin-like receptor subfamily c, member 2

ITIF3: interferon-induced protein with tetratricopeptide repeats 3

SOS1: son of sevenless homolog 1 (drosophila) G1P2: interferon, alpha-inducible protein (clone ifi-15k)

LOC652775: similar to ig kappa chain v-v region l7 precursor

CCL3L1: chemokine (c-c motif) ligand 3-like 1

MBP: myelin basic protein S100P: s100 calcium binding protein p

IFITM3: interferon induced transmembrane protein 3 (1-8u)

MX1: myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)

HERC5: hect domain and rld 5 NME4: non-metastatic cells 4, protein expressed in

HLA-DQB1: major histocompatibility complex, class ii, dq beta 1

LOC653157: similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)

LOC643313: similar to hypothetical protein loc284701 RSAD2: radical s-adenosyl methionine domain containing 2

Page 17: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

17

Harvard Medical School Massachusetts Institute of Technology

Conclusions

• A Bayesian network framework to infer dynamic gene networks from time-series gene expression microarrays:– Does not require densely sampled microarray data.– Able to handle noise and handle biological variability.– Temporal dependency is captured by first-order Markov

process.– The optimal network model is achieved by parallelized search

algorithm. • Application to HIV viral load tracking shows how our

method can be used in clinical studies:– Our network model tracks viral load trajectories with higher

accuracy than viral load auto-regressive model.– Our model provides candidate gene targets for drug/vaccine

development.

Page 18: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

18

Harvard Medical School Massachusetts Institute of Technology

Acknowledgements

Supported by Center for HIV/AIDS Vaccine Immunology (CHAVI) # U19 AI067854-06:

•National Institute of Allergy and Infectious Diseases (NIAID)•National Institutes of Health (NIH)•Division of AIDS (DAIDS)•U.S. Department of Health and Human Services (HHS)

Page 19: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

19

Harvard Medical School Massachusetts Institute of Technology

AT+2

BT+2

CT+2

NT+2

VLT+2

AT+3

BT+3

CT+3

NT+3

VLT+3

AT+1

BT+1

CT+1

NT+1

VLT+1

AT+2

BT+2

CT+2

NT+2

VLT+2

Stationary Network Inference

AT

BT

CT

NT

VLT

AT+1

BT+1

CT+1

NT+1

VLT+1

AT+2

BT+2

CT+2

NT+2

VLT+2

• All networks between pairs of times are identical.

Page 20: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

20

Harvard Medical School Massachusetts Institute of Technology

Page 21: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

21

Harvard Medical School Massachusetts Institute of Technology

Pathway: Immune Response (16/30 genes, p<10-6)AMY1A: amylase, alpha 1a; salivary OTOF: otoferlin

TNFAIP6 : tumor necrosis factor, alpha-induced protein 6

KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3

NBPF14: neuroblastoma breakpoint family, member 14 OSBP2: oxysterol binding protein 2

IRF7: interferon regulatory factor 7 CFD: complement factor d (adipsin)

HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1

HLA-DRB1: major histocompatibility complex, class ii, dr beta 1

RPS23: ribosomal protein s23 GPR56: g protein-coupled receptor 56

IFI44L: interferon-induced protein 44-like CCL23: chemokine (c-c motif) ligand 23

KLRC2: killer cell lectin-like receptor subfamily c, member 2

ITIF3: interferon-induced protein with tetratricopeptide repeats 3

SOS1: son of sevenless homolog 1 (drosophila) G1P2: interferon, alpha-inducible protein (clone ifi-15k)

LOC652775: similar to ig kappa chain v-v region l7 precursor

CCL3L1: chemokine (c-c motif) ligand 3-like 1

MBP: myelin basic protein S100P: s100 calcium binding protein p

IFITM3: interferon induced transmembrane protein 3 (1-8u)

MX1: myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)

HERC5: hect domain and rld 5 NME4: non-metastatic cells 4, protein expressed in

HLA-DQB1: major histocompatibility complex, class ii, dq beta 1

LOC653157: similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)

LOC643313: similar to hypothetical protein loc284701 RSAD2: radical s-adenosyl methionine domain containing 2

Page 22: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

22

Harvard Medical School Massachusetts Institute of Technology

major histocompatibility complex, class ii, dr beta 1 otoferlin

tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3

neuroblastoma breakpoint family, member 14 oxysterol binding protein 2

interferon regulatory factor 7 complement factor d (adipsin)

major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary

ribosomal protein s23 g protein-coupled receptor 56

killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23

interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3

son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k)

similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1

myelin basic protein s100 calcium binding protein p

interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)

hect domain and rld 5 non-metastatic cells 4, protein expressed in

major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)

similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2

Pathway: Antiviral Defense (8/30 genes, p<10-3)

Page 23: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

23

Harvard Medical School Massachusetts Institute of Technology

major histocompatibility complex, class ii, dr beta 1 otoferlin

tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3

neuroblastoma breakpoint family, member 14 oxysterol binding protein 2

interferon regulatory factor 7 complement factor d (adipsin)

major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary

ribosomal protein s23 g protein-coupled receptor 56

killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23

interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3

son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k)

similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1

myelin basic protein s100 calcium binding protein p

interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)

hect domain and rld 5 non-metastatic cells 4, protein expressed in

major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)

similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2

Pathway: Inflammatory Response (5/30 genes, p<0.05)

Page 24: Inferring Nonstationary Gene Networks from Temporal Gene Expression Data

24

Harvard Medical School Massachusetts Institute of Technology

major histocompatibility complex, class ii, dr beta 1 otoferlin

tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3

neuroblastoma breakpoint family, member 14 oxysterol binding protein 2

interferon regulatory factor 7 complement factor d (adipsin)

major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary

ribosomal protein s23 g protein-coupled receptor 56

killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23

interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3

son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k)

similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1

myelin basic protein s100 calcium binding protein p

interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)

hect domain and rld 5 non-metastatic cells 4, protein expressed in

major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase)

similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2

Interferon Family Dominates

3 pathways; 2 pathways; 1 pathway