graph mining applications to machine learning problems

50
1 Graph Mining Applications to Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda

Upload: dylan-hunt

Post on 30-Dec-2015

57 views

Category:

Documents


1 download

DESCRIPTION

Graph Mining Applications to Machine Learning Problems. Max Planck Institute for Biological Cybernetics Koji Tsuda. Graphs …. A. C. G. C. UA. CG. CG. U. U. U. U. Graph Structures in Biology. Compounds. DNA Sequence RNA Texts in literature. H. C. C. C. H. H. O. C. C. H. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graph Mining Applications to Machine Learning Problems

1

Graph Mining Applications to Machine Learning Problems

Max Planck Institute for Biological Cybernetics

Koji Tsuda

Page 2: Graph Mining Applications to Machine Learning Problems

2

Graphs…

Page 3: Graph Mining Applications to Machine Learning Problems

3

DNA Sequence

RNA

Texts in literature

Graph Structures in Biology

C

C OC

C

C

C

H

A C G C

Amitriptyline inhibits adenosine uptake

H

H

H

H

H

Compounds

CG

CG

U U U U

UA

Page 4: Graph Mining Applications to Machine Learning Problems

4

Substructure Representation

0/1 vector of pattern indicatorsHuge dimensionality!Need Graph Mining for selecting featuresBetter than paths (Marginalized graph kernels)

patterns

Page 5: Graph Mining Applications to Machine Learning Problems

5

OverviewQuick Review on Graph Mining

EM-based Clustering algorithm Mixture model with L1 feature selection

Graph Boosting Supervised Regression for QSAR Analysis Linear programming meets graph mining

Page 6: Graph Mining Applications to Machine Learning Problems

6

Quick Review of Graph Mining

Page 7: Graph Mining Applications to Machine Learning Problems

7

Graph MiningAnalysis of Graph Databases Find all patterns satisfying

predetermined conditions Frequent Substructure Mining

Combinatorial, ExhaustiveRecently developed AGM (Inokuchi et al., 2000), gspan

(Yan et al., 2002), Gaston (2004)

Page 8: Graph Mining Applications to Machine Learning Problems

8

Graph Mining

Frequent Substructure Mining Enumerate all patterns occurred in at

least m graphs

:Indicator of pattern k in graph i

Support(k): # of occurrence of pattern k

Page 9: Graph Mining Applications to Machine Learning Problems

9

Gspan (Yan and Han, 2002)

Efficient Frequent Substructure Mining MethodDFS Code

Efficient detection of isomorphic patterns

Extend Gspan for our works

Page 10: Graph Mining Applications to Machine Learning Problems

10

Enumeration on Tree-shaped Search Space

Each node has a patternGenerate nodes from the root: Add an edge at each step

Page 11: Graph Mining Applications to Machine Learning Problems

11

Tree PruningAnti-monotonicity:

If support(g) < m, stop exploring!

Not generated

Support(g): # of occurrence of pattern g

Page 12: Graph Mining Applications to Machine Learning Problems

12

Discriminative patterns:Weighted Substructure Mining

w_i > 0: positive classw_i < 0: negative classWeighted Substructure Mining

Patterns with large frequency differenceNot Anti-Monotonic: Use a bound

Page 13: Graph Mining Applications to Machine Learning Problems

13

Multiclass version

Multiple weight vectors (graph belongs to

class ) (otherwise)

Search patterns overrepresented in a class

Page 14: Graph Mining Applications to Machine Learning Problems

14

EM-based clustering of graphs

Tsuda, K. and T. Kudo: Clustering Graphs by Weighted Substructure Mining. ICML 2006, 953-960, 2006       

Page 15: Graph Mining Applications to Machine Learning Problems

15

EM-based graph clustering

Motivation Learning a mixture model in the

feature space of patterns Basis for more complex probabilistic

inference

L1 regularization & Graph MiningE-step -> Mining -> M-step

Page 16: Graph Mining Applications to Machine Learning Problems

16

Probabilistic ModelBinomial Mixture

Each Component

:Mixing weight for cluster :Feature vector of a graph (0 or 1)

:Parameter vector for cluster

Page 17: Graph Mining Applications to Machine Learning Problems

17

Function to minimize

L1-Regularized log likelihood

Baseline constant ML parameter estimate using single

binomial distribution

In solution, most parameters exactly equal to constants

Page 18: Graph Mining Applications to Machine Learning Problems

18

E-step

Active pattern

E-step computed only with active patterns (computable!)

Page 19: Graph Mining Applications to Machine Learning Problems

19

M-stepPutative cluster assignment by E-step

Each parameter is solved separately

Use graph mining to find active patternsThen, solve it only for active patterns

Page 20: Graph Mining Applications to Machine Learning Problems

20

Solution

Occurrence probability in a cluster

Overall occurrence probability

Page 21: Graph Mining Applications to Machine Learning Problems

21

Important Observation

For active pattern k, the occurrence probability in a graphcluster is significantly different from the average

Page 22: Graph Mining Applications to Machine Learning Problems

22

Mining for Active Patterns F

F is rewritten in the following form

Active patterns can be found by graph mining! (multiclass)

Page 23: Graph Mining Applications to Machine Learning Problems

23

Experiments: RNA graphsStem as a nodeSecondary structure by RNAfold0/1 Vertex label (self loop or not)

Page 24: Graph Mining Applications to Machine Learning Problems

24

Clustering RNA graphs

Three Rfam families Intron GP I (Int, 30 graphs) SSU rRNA 5 (SSU, 50 graphs) RNase bact a (RNase, 50 graphs)

Three bipartition problems Results evaluated by ROC scores

(Area under the ROC curve)

Page 25: Graph Mining Applications to Machine Learning Problems

25

Examples of RNA Graphs

Page 26: Graph Mining Applications to Machine Learning Problems

26

ROC Scores

Page 27: Graph Mining Applications to Machine Learning Problems

27

No of Patterns & Time

Page 28: Graph Mining Applications to Machine Learning Problems

28

Found Patterns

Page 29: Graph Mining Applications to Machine Learning Problems

29

Summary (EM)Probabilistic clustering based on substructure representation Inference helped by graph miningMany possible extensions Naïve Bayes Graph PCA, LFD, CCA Semi-supervised learning

Applications in Biology?

Page 30: Graph Mining Applications to Machine Learning Problems

30

Graph Boosting

Saigo, H., T. Kadowaki and K. Tsuda: A Linear Programming Approach for Molecular QSAR analysis. International Workshop on Mining and Learning with Graphs, 85-96, 2006

Page 31: Graph Mining Applications to Machine Learning Problems

31

Graph Regression Problem

Known as QSAR problem in chemical informatics Quantitative Structure-Activity

Analysis

Given a graph, predict a real-value Typically, features (descriptors) are

given

Page 32: Graph Mining Applications to Machine Learning Problems

32

QSAR with conventional descriptors

#atoms #bonds #rings … Activity

22 25 3

20 21 1.2

23 24 0.77

11 11 -3.52

21 22 -4

Page 33: Graph Mining Applications to Machine Learning Problems

33

Motivation of Graph Boosting

Descriptors are not always availableNew features by obtaining informative patterns (i.e., subgraphs) Greedy pattern discovery by Boosting + gSpanLinear Programming (LP) Boosting for reducing the number of graph mining calls Accurate prediction & interpretable results

Page 34: Graph Mining Applications to Machine Learning Problems

34

Molecule as a labeled graph

C

C

CC

CC

O

CC C

C

Page 35: Graph Mining Applications to Machine Learning Problems

35

QSAR with patterns… Activity

1 1 1 3

-1 1 -1 1.2

-1 1 -1 0.77

-1 1 -1 -3.52

1 1 -1 -4

C

C

C

C

C

C

CC

C

C

C

C

CC

CC

O

Cl

C

)? (fC

C

C

C

C

C

CC

C

C

C

C

CC

CC

O

Cl

C1

2 3 ...

Page 36: Graph Mining Applications to Machine Learning Problems

36

Sparse regression in a very high dimensional space

G: all possible patterns (intractably large)|G|-dimensional feature vector x for a molecule Linear Regression

Use L1 regularizer to have sparse αSelect a tractable number of patterns

d

jjjxαf

1

)(x

Page 37: Graph Mining Applications to Machine Learning Problems

37

Problem formulation

We introduce ε-insensitive loss and L1 regularizer

m: # of training graphs

d = |G|

ξ+, ξ- : slack variables

ε: parameter

Page 38: Graph Mining Applications to Machine Learning Problems

38

Dual LP

Primal: Huge number of weight variables Dual: Huge number of constraintsLP1-Dual

Page 39: Graph Mining Applications to Machine Learning Problems

39

Column Generation Algorithm for LP Boost (Demiriz et al., 2002)

Start from the dual with no constraintsAdd the most violated constraint each timeGuaranteed to converge Constraint Matrix

UsedPart

Page 40: Graph Mining Applications to Machine Learning Problems

40

Finding the most violated constraint

Constraint for a pattern (shown again)

Finding the most violated one

Searched by weighted substructure mining

m

iijixu

1

11

m

iijij xu

1

maxarg

Page 41: Graph Mining Applications to Machine Learning Problems

41

Algorithm Overview

Iteration Find a new pattern by graph mining with

weight u If all constraints are satisfied, break Add a new constraint Update u by LP1-Dual

Return Convert dual solution to obtain primal

solution α

Page 42: Graph Mining Applications to Machine Learning Problems

42

Speed-up by adding multiple patterns (multiple pricing)

So far, the most violated pattern is chosen

Mining and inclusion of top k patterns at each iteration Reduction of the number of mining

calls

m

iijij xu

1

maxarg

A Linear Programming Approach for Molecular QSAR Analysis

Page 43: Graph Mining Applications to Machine Learning Problems

43

Speed-up by multiple pricing

Page 44: Graph Mining Applications to Machine Learning Problems

44

Clearly negative data#atoms #bonds #rings … Activity

22 25 3

20 21 1.2

23 24 0.77

11 11 -3.52

21 22 -4

22 20 -10000

23 19 -10000

A Linear Programming Approach for Molecular QSAR Analysis

Page 45: Graph Mining Applications to Machine Learning Problems

45

Inclusion of clearly negative data

LP2-Primal

l: # of clearly negative data

z: predetermined upperbound

ξ’ : slack variable

Page 46: Graph Mining Applications to Machine Learning Problems

46

Experiments

Data from Endocrine Disruptors Knowledge Base 59 compounds labeled by real number and 61

compounds labeled by a large negative number

Label (target) is a log translated relative proliferative potency (log(RPP)) normalized between –1 and 1

Comparison with Marginalized Graph Kernel + ridge regression Marginalized Graph Kernel + kNN regression

Page 47: Graph Mining Applications to Machine Learning Problems

47

Results with or without clearly negative data

LP2

LP1

Page 48: Graph Mining Applications to Machine Learning Problems

48

Extracted patterns

Interpretable compared with implicitly expressed features by Marginalized Graph Kernel

Page 49: Graph Mining Applications to Machine Learning Problems

49

Summary (Graph Boosting)

Graph Boosting simultaneously generate patterns and learn their weightsFinite convergence by column generationPotentially interpretable by chemists.Flexible constraints and speed-up by LP.

Page 50: Graph Mining Applications to Machine Learning Problems

50

Concluding Remarks

Using graph mining as a part of machine learning algorithms Weights are essential Please include weights when you

implement your item-set/tree/graph mining algorithms

Make it available on the web! Then ML researchers can use it