direct sparse structural change detection in markov …liu/papers/ecml_talk.pdfsl is supported by...

28
Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation Song Liu 1 , John Quinn², Michael Gutmann 3 ,Taiji Suzuki 1 and Masashi Sugiyama 1 "Tempora mutantur, nos et mutamur in illis. " "Times change, and we change with time. “ Latin Phrase. SL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO program, and MS is supported by the JST CREST program. MUG is supported by the Finnish Centre-of-Excellence in Computational Inference Research COIN (251170). TS was partially supported by MEXT Kakenhi 25730013, and the Aihara Project, the FIRST program from JSPS, initiated by CSTP. 1. Tokyo Institute of Technology, Japan 2. Makerere University, Uganda 3. University of Helsinki and HIIT, Finland. 1

Upload: duongkhue

Post on 24-Mar-2018

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation

Song Liu1, John Quinn², Michael Gutmann3 ,Taiji Suzuki1

and Masashi Sugiyama1

"Tempora mutantur, nos et mutamur in illis. ""Times change, and we change with time. “

Latin Phrase.

SL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO program, and MS is supported by the JST CREST program. MUG is supported by the Finnish Centre-of-Excellence in Computational Inference Research COIN (251170). TS was partially supported by MEXT Kakenhi 25730013, and the Aihara Project, the FIRST program from JSPS, initiated by CSTP.

1. Tokyo Institute of Technology, Japan

2. Makerere University, Uganda

3. University of Helsinki and HIIT, Finland.

1

Page 2: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Interactions Everywhere

Examples of InteractionsGenes regulate each other via

gene network. Synonyms tend to co-occur in

the same text corpus. Brain EEG signals may be

synchronized in a certain pattern.

However, such interactions may be changing!

(Wikipedia)

2

Page 3: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Structural Change Detection

Interactions between features may change.

e.g. Some genes related to sleep may be activated, but only in the evening.

“apple” may co-occur with “banana” quite often in cookbook, but not in IT news. 3

The change of brain signal correlation at two different experiment intervals. (Williamson et al., 2012)

x = (x(1); : : : ; x(d))>

Page 4: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Outline

1. Introductions

2. Related Works1. Gaussian Markov Networks (GMNs)

2. Estimating Sparse GMN: Graphical Lasso (Glasso)

3. Glasso: Pros and Cons

4. Detecting Changes via Fused-lasso (Flasso)

5. Nonparanoraml Extension

6. Generalized Log-linear Model

7. The Normalization Issue

3. Proposed Approach

4. Experiments

5. Conclusion

4

Page 5: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Gaussian Markov Networks (GMNs)

5

The interactions between random variables can be modelled by Markov Networks.

Markov Networks (MNs) are undirected Graphical Models. The simplest example of MN is a Gaussian MN:

We can visualize the above MN using an undirected graph.

𝒙 = (𝑥 1 , 𝑥 2 , … , 𝑥(𝑑)) 𝚯 is the inverse covariance matrix

1 23

546

Page 6: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Estimating Sparse GMN: Graphical Lasso (Glasso)

Recall, we would like to detect changes between MNs.

Changes can be found once the structure of two separate GMNs are known to us.

Estimating Sparse GMNs can be done via Graphical Lasso (Glasso).

Idea: Twice Glasso, take the parameter difference.

6

is the sample covariance matrix.

1 23

546

1 23

546

Tibshirani, JRSS 1996; Friedman et al., Biostatistics 2008

Page 7: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Glasso: Pros and Cons

Pros:

Statistical properties are well studied.

Off-the-shelf software can be used.

Cons:

Cannot detect high-order correlation.

Does not work if 𝑝 or 𝑞 is dense.

Not clear how to choose 𝜆𝑃or 𝜆𝑄.

7

P Q

Change

sparse sparse

sparse

Can we penalize change directly?

Page 8: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Detecting Changes via Fused-lasso (Flasso)

8

We can impose sparsity directly on , using Fused-lasso (Tibshirani et al., 2005).

Consider the following objective:

Similar approach for Gaussian structural change was proposed.Using Pseudo-likelihood

We don’t have to assume 𝑝 or 𝑞 is sparse.

Sparsity control is much easier than Glasso.

Gaussianity is still assumed.

(Zhang & Wang, UAI2010)

Page 9: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Nonparanormal (NPN) Extension

We may assume data are Gaussian after NPN transform.

More flexible than Gaussian methods, still tractable.

However, NPN extension is still restrictive. 9

𝑓𝑘 : Monotone, differentiable function𝒙 = (𝑥 1 , 𝑥 2 , … , 𝑥(𝑑))

𝒇(𝒙) = (𝑓1(𝑥1 ), 𝑓2(𝑥

2 ), … , 𝑓𝑑(𝑥𝑑 ))

Liu et al., JMLR 2009

Page 10: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Generalized Log-linear Model

Pairwise Markov Network

𝒇 are feature vectors.

The normalization term 𝑍(𝜽) is generally intractable.Gaussian or NPN models are exceptions.

𝒇: ℛ2 → ℛ𝑏

10

Gaussian: 𝑓𝑔𝑎𝑢 𝑥, 𝑦 = 𝑥𝑦

Nonparanormal: 𝑓𝑛𝑝𝑛 𝑥, 𝑦 = 𝑓 𝑥 𝑓(𝑦)

Polynomial: 𝒇𝑝𝑜𝑙𝑦 𝑥, 𝑦 = [𝑥𝑘 , 𝑦𝑘 , 𝑥𝑘−1𝑦 … , 𝑥, 𝑦, 1]

Page 11: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

The Normalization Issue

However, for a generalized Markov Network, there is no closed-form for 𝑍(𝜽).

Importance sampling can be used to approximate 𝑍(𝜽).

Can adapt Glasso & Flasso for non-Gaussian data.

May result a high variance estimator depending on the choice of 𝑝inst. How to choose 𝑝inst is not clear.

11

!

Page 12: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Outline

1. Introductions

2. Problem Formulation

3. Related Works

4. Proposed Approach1. Modelling Changes Directly

2. Density ratio Estimation

3. Sparsity inducing Norm

4. The Dual Formulation

5. Experiments

6. Conclusion

12

Page 13: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Recall, our interest is:

The ratio of two MNs naturally incorporates the !

Modeling Changes Directly

So, model the ratio directly! 13

Page 14: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Modeling Changes Directly

We model density ratio instead of density function:

The normalization term is:

To ensure: Sample average approximationAlso works when integral has no closed form!

14

Page 15: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Estimating Density Ratio

Kullback-Leibler Importance Estimation Procedure (KLIEP):

Unconstrained convex optimization!

Sugiyama et al., NIPS 2007

Tsuboi et al, JIP 2009

15

Page 16: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Sparsity Inducing Norm

Impose sparsity constraints on each factor 𝜷𝑢,𝑣. equals to impose sparsity on changes.

So finally, we can obtain a with group sparsity!

L2 regularizersGroup lasso regularizer

Elastic Net

16

𝜷

Page 17: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

The Dual Formulation

0

100

200

300

400

500

600

700

800

40 50 60 70 80

Primal

Dual

Dimension

Co

mp

utatio

nal Tim

e

When dimensionality is high, the dual formulation is preferred.

Optimize 𝜶 on probability simplex.

In longer Version

17

Page 18: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Outline

1. Introductions

2. Problem Formulation

3. Related Works

4. Proposed Approach

5. Experiments1. Numerical Experiments

2. Real-world Application

6. Conclusion

18

Page 19: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Gaussian Distribution (𝑛 = 100, 𝑑 = 40)

Regularization path

Start from 40 dimensional GMN with random correlations.

Randomly drops 15 edges. Precision and Recall curves are

averaged over 20 runs.

19P-R curve

𝜷𝑢,𝑣

𝜷𝑢,𝑣

𝜷𝑢,𝑣

Page 20: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Gaussian Distribution (𝑛 = 50, 𝑑 = 40)

20P-R curve

Regularization path

Start from 40 dimensional GMN with random correlations.

Randomly drops 15 edges. Precision and Recall curves are

averaged over 20 runs.

𝜷𝑢,𝑣

𝜷𝑢,𝑣

𝜷𝑢,𝑣

Page 21: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Diamond Distribution (n = 5000, d = 9)

Diamond Distribution:

Samples are drawn by slice sampling.

Only the methods with the correct model has good performance.

Regularization pathP-R curve

21

𝜷𝑢,𝑣

Page 22: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Real-world Applications

•Gene Network•Detecting changes from the original network to the modified network.

• Twitter Messages•Samples are the frequencies of 10 related keywords over time. •Detecting the change of co-occurrences on keywords before and after a certain events.

source: Wikipedia

22

Page 23: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Gene Network P QGene regulatory network is modified manually. 50 Samples are collected

before (𝑃) and after (𝑄)the change.

Polynomial kernel is used for 𝒇.𝜆1 is chosen by hold-out cross validation.Only KLIEP, Flasso and IS-Flasso are compared.

23

Page 24: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Gene Network (n = 50, d = 13)

Regularization path, KLIEP Flasso IS-Flasso

P-R curve 24

𝜷𝑢,𝑣

|𝜷𝑢,𝑣|

|𝜷𝑢,𝑣|

Page 25: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Q P P P

Twitter Keywords

Time

3 weeks~4.17

We choose the Deepwater Horizon oil

spill as the target event.

25

Page 26: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Twitter Keywords From 7.26-9.14

KLIEP Flasso

26

Page 27: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Outline

1. Introductions

2. Problem Formulation

3. Related Works

4. Proposed Approach

5. Experiments

6. Conclusion

27

Page 28: Direct Sparse Structural Change Detection in Markov …liu/papers/ECML_TALK.pdfSL is supported by the JST PRESTO program and the JSPS fellowship, JQ is supported by the JST PRESTO

Conclusion

Learning sparse changes in two Markov Networks, directly!By density ratio estimation

Two advantages comparing to conventional methods:Higher AccuracyThanks to the direct modelling nature.

Wider ApplicabilityP and Q are not only limited to discrete, Gaussian, or NPN.

28