comparative study of granger causality algorithm for gene regulatory network

31
INVESTIGATION OF IMAGE PROCESSING ALGORITHMS FOR MEDICAL APPLICATION Zhafir Aglna Tijani U1120208F A final year project presentation in partial fullfilment of the requirement for the degree of Bachelor of Engineering 1

Upload: zhafir-aglna-tijani

Post on 31-Jul-2015

68 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

1

INVESTIGATION OF IMAGE PROCESSING ALGORITHMS FOR MEDICAL APPLICATION

Zhafir Aglna TijaniU1120208F

A final year project presentation in partial fullfilment of the requirement for the degree of Bachelor of Engineering

Page 2: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

2

Background and Theory

Implementation

Result and Discussion

Conclusion

Outline

Page 3: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

3

Background and Theory

• Problems and Objectives• Gene Regulatory Network• Granger Causality• 3 Methods of Granger Causality• Project Focus

Implementation

Result and Discussion

Conclusion

Outline

Page 4: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

4

“It is more pragmatic to cure the cause of disease at its sources than to handle the

actual diseases”

Gene

Page 5: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

5

• The Interaction between genes is called Gene Regulatory Network

• The discovery of this network still have a lot of challenge because of complexity of the network

• Efficient Computational Tools are required

To find an effective and efficient means to discover unknown Gene

Regulatory Network

Objective

Page 6: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

6

Modelling of GRN

• Nodes and Edges• Depicting the

relation between genes

• Obtained from DNA Microarray

• Prominent Method : Granger Causality

http://img.medicalxpress.com/newman/gfx/news/hires/2013/1-novelnoninva.jpg

Page 7: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

7

Granger Causality

• Method for Time Series Analysis • Utilized Vector Auto-regression (VAR) Model to calculate

causality based on Time Series data.

Granger (1969)

A BTime Series Time Series

U t=∑𝑘=1

𝑝

AkU t −k+εt 𝐹 𝑌→ 𝑋≡ ln ¿ Σ𝑥𝑥′ ∨ ¿

¿ Σ𝑥𝑥∨¿¿¿

Page 8: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

8

Granger Causality

“If past values of A and B can predict future value of B better than past values of B alone, Then, time series A granger cause time

series B”

Granger (1969)

A BTime Series Time Series

Page 9: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

9

MVGC Lasso CopulaBarnett et al. (2013) Arnold et al. (2007) Liu and Bahadori (2012)

3 Methods of Implementing Granger Causality

“These 3 Methods has been implemented independently,but never been compared using the same condition.”

Page 10: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

10

Main Focus of the Project

• Comparative Study of Algorithms

• Focus on the Performance of 3 Algorithms

• Finding Strength and Weaknesses

• Utilizing Control Variables and Metrics Performance

Page 11: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

11

Background and Theory

Implementation

Result and Discussion

Conclusion

• Control Variables• Causality Graph and Matrix• Edge Analysis)• Performance Metrics• Data for Analysis

Outline

Page 12: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

12

Implementation

Time Series input

GC Algorithm

Causality Matrix and

Graph

Edge Analysis

Data for Discussion

• Implementation using MATLAB 2010b• Based on Existing Toolboxes :

• MVGC Toolbox ( Barnett, 2013 )• Lasso Granger• Copula Granger ( Liu and Bahadori, 2012 )• GLMnet

Program Flow

Page 13: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

13

Implementation

Control Variables

• Based on Set of Equations• Linear Time Series Dataset • Generated by specifying The Number of Time Points• Advantages :

• Provide Ground Truth Network : Actual Causality of the Time Series• Ground Truth can be compared with the Algorithm Output to measure

the performance of Algorithms

• 2 Types of Dataset : 3-VAR and 5-VAR Time Series

• 8 different Number of Time Points : 200, 400, 800, 1200, 1600,

2400, 3200, 4000

Synthetic Time Series Dataset

Page 14: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

14

3 Granger Causality Algorithms

Page 15: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

15

Causality Matrix

• 1 represent : Link Exist between Variables• 0 represent : Link Does not Exist

0 0 0 0 0

1 0 0 0 0

1 0 0 0 0

1 0 0 0 1

1 0 0 1 0

• Output of GC Algorithm is the Causality Matrix• Depict granger causality between time series

Page 16: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

16

Edge Analysis

• The result of Algorithm are masked with Binary Masking with the threshold of 0.0001

0 0 0 0 0

1 0 0 0 0

1 0 0 0 0

1 0 0 0 1

1 0 0 1 0

• Edge Analysis is a method to measure the performance of an Algorithm by comparing it with the Benchmark

• Benchmark = Ground Truth

0 0 1 0 1

1 0 1 1 1

1 1 1 0 1

0 1 0 1 1

1 1 1 0 1

Ground Truth Lasso Method

Page 17: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

17

Edge Analysis

For above example • TP : 4• TN : 6• FP : 13• FN : 2

0 0 0 0 0

1 0 0 0 0

1 0 0 0 0

1 0 0 0 1

1 0 0 1 0

• Using Parameters from Confusion Matrix : • True Positives, True Negatives, False Positives, and False

Negatives

0 0 1 0 1

1 0 1 1 1

1 1 1 0 1

0 1 0 1 1

1 1 1 0 1

Ground Truth Lasso Method

Page 18: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

18

7 Performance Metrics

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦=𝑇𝑃

𝑇𝑃+𝐹𝑁

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦=𝑇𝑁

𝑇𝑁+𝐹𝑃

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛=𝑇𝑃

𝑇𝑃+𝐹𝑃

𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒=𝐹𝑃

𝑇𝑁+𝐹𝑃

𝐹𝑎𝑙𝑠𝑒𝐷𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑦 𝑅𝑎𝑡𝑒=𝐹𝑃

𝑇𝑃+𝐹𝑃

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦=𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

𝐹 1𝑆𝑐𝑜𝑟𝑒=2𝑇𝑃

2𝑇𝑃+𝐹𝑃+𝐹𝑁

• Calculated based on the value of TP, TN, FP, and FN• Used in Past Research in Similar Topic

Page 19: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

19

Data for Analysis

• The Result of Granger Causality depends on the generated time series

• Few sample was not sufficient, Since time series generated was different each time

• The experiment was iterated by 2000 times

• Mean Value of each performance metrics will be the basis for comparative study

0 0 1 0 1

1 0 1 1 1

1 1 1 0 1

0 1 0 1 1

1 1 1 0 1

0 0 1 0 0

0 0 1 1 0

1 1 1 0 1

0 1 0 1 1

0 1 0 1 1

Lasso : 1st Iteration

Lasso : 2nd Iteration

Page 20: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

20

Background and Theory

Implementation

Result and Discussion

Conclusion

Outline

• Performance Metrics Scores• Specific Result

• 5-VAR Accuracy• 3-VAR and 5-VAR F1 Score

• Overall Score Result

Page 21: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

21

Scores of Metrics

• Bar chart to represent the score of each performance metrics on 3 methods

• X axis : Number of Time Points• Y axis : Score of Metrics

• 7 Metrics Performance• 2 Scenario : 3-VAR and 5-VAR

200 400 800 1200 1600 2400 3200 40000

0.1

0.2

0.3

0.4

0.5

0.6

VAR5 F1 Score

MVGCLASSOCOPULA

Number of Time Points

Score

Page 22: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

22

200 400 800 120016002400320040000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

VAR5 Specificity

MVGC

LASSO

COPULA

Number of Time Points

Score

200 400 800 1200 1600 2400 3200 40000

0.2

0.4

0.6

0.8

1

1.2VAR5 Sensitivity

MVGC

LASSO

COPULA

Number of Time Points

Score

200 400 800 1200 1600 2400 3200 40000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

VAR5 Precision

MVGC

LASSO

COPULA

Number of Time Points

Score

200 400 800 120016002400320040000

0.2

0.4

0.6

0.8

1

VAR5 False Positive Rate

MVGC

LASSO

COPULA

Number of Time Points

Score

200 400 800 120016002400320040000

0.10.20.30.40.50.60.70.80.9

VAR5 False Discovery Rate

MVGC

LASSO

COPULA

Number of Time Points

Score

200 400 800 1200 1600 2400 3200 40000

0.1

0.2

0.3

0.4

0.5

0.6

VAR5 Accuracy

MVGC

LASSO

COPULA

Number of Time Points

Score

Page 23: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

23

200 400 800 1200 1600 2400 3200 40000

0.1

0.2

0.3

0.4

0.5

0.6

VAR5 Accuracy

MVGCLASSOCOPULA

Number of Time Points

Sco

re

5-VAR Accuracy

• Accuracy• Proportion of true result among total

links available

• MVGC• Increasing as Number of time Points

Increase• Score range was small ( around 0,1 )

• Lasso• Increasing as Number of Time Points

Increase• Two extreme scores, Wide score Range

• Copula• Optimized during number of time

points around 400• Bad performance at higher number of

time points

Page 24: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

24

3-VAR and 5-VAR F1 Score

• F1 Score• Statistical Significance based on

Harmonic mean of Precision and Recall

• MVGC• Consistent Pattern, Increases as time

point increases

• Lasso• Contrast Pattern• Heavily affected by number of

variables

• Copula• Unique Pattern• Has a certain point / range where

performance is optimized

200 400 800 1200 1600 2400 3200 40000

0.10.20.30.40.50.60.70.8

VAR3 F1 Score

MVGCLASSOCOPULA

Number of Time Points

Score

200 400 800 1200 1600 2400 3200 40000

0.1

0.2

0.3

0.4

0.5

0.6

VAR5 F1 Score

MVGCLASSOCOPULA

Number of Time Points

Score

Page 25: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

25

Overall Performance

Metrics type Best Performance Average

Performance

Worst Performance

Sensitivity Lasso Copula MVGC

Specificity MVGC Lasso Copula

Precision MVGC Lasso Copula

False Positive Rate MVGC Lasso Copula

False Discovery Rate MVGC Lasso Copula

Accuracy MVGC Lasso Copula

F1 – Score MVGC Lasso Copula

3 – Variable Time Series

• Overall performance based on average score of all time points• MVGC Outperforms other two methods in 3-VAR Scenario• Lasso scores was good during high number of time points• Copula has certain range which their score was high ( around 200 – 800 time

points ), but outside of that the score were lower than other method

Page 26: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

26

Metrics type Best

Performance

Average

Performance

Worst

PerformanceSensitivity MVGC Copula LassoSpecificity Lasso Copula MVGCPrecision MVGC Copula LassoFalse Positive Rate Lasso Copula MVGCFalse Discovery Rate MVGC Copula LassoAccuracy Copula MVGC LassoF1 – Score MVGC Copula Lasso

5 – Variable Time Series

• MVGC shows Consistency in both 5-VAR and 3-VAR• Copula provides best accuracy compared to other method, especially during 200

– 800 time points • Lasso score is the highest during high number of time points, but the score during

low number of time points were low.

Page 27: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

27

Background and Theory

Implementation

Result and Discussion

Conclusion

Outline

• Conclusion• Future

Works

Page 28: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

28

Conclusion

• 3 Methods of GC : MVGC, Lasso, and Copula can be compared using 7 Performance Metrics

• MVGC provides consistency in most of condition

• Lasso has advantages in handling high number of time points

• Copula has certain range which their performance was optimized

• Even though overall score favours MVGC compared to other methods, the results are still conditional

Page 29: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

29

Suggestions for Future Work

• Granger Causality Algorithms for non-linear Data• Non-linear data provides better representation for Gene Regulatory Network

• Application to Real Dataset• Granger Causality Analysis may be applied to real dataset

• Other Algorithm for GRN ( Dynamic Bayesian Network )• DBN is another prominent method in this topic

Page 30: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

30

Thank You

Page 31: Comparative Study of Granger Causality Algorithm for Gene Regulatory Network

31

Q & A