graph summarization in annotated data using probabilistic...

38
Graph Summarization in Annotated Data using Probabilistic Soft Logic Alex Memory 1 , Angelika Kimmig 1,2 , Stephen H. Bach 1 , Louiqa Raschid 1 and Lise Getoor 1 1 University of Maryland 2 KU Leuven Sunday, November 11, 12

Upload: others

Post on 17-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Graph Summarization in Annotated Data using Probabilistic Soft Logic

Alex Memory1, Angelika Kimmig1,2, Stephen H. Bach1, Louiqa Raschid1 and Lise Getoor1

1University of Maryland2KU Leuven

Sunday, November 11, 12

Page 2: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Agenda

• Annotation Graphs and Graph Summarization

• Heuristics for Graph Summarization

• Probabilistic Soft Logic

• Evaluation and Results

2

Sunday, November 11, 12

Page 3: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038 / nrc2036

Xu Wanga, Yangyang Bianb, Kai Chengb, Li-Fei Gua, Mingliang Yeb, Hanfa Zoub, Jun-Xian Hea,, A large-scale protein phosphorylation analysis reveals novel phosphorylation motifs and phosphoregulatory networks in Arabidopsis, Journal of Proteomics, Oct 2012.

Anna Strum

illo, http://ww

w.fotopedia.com

/items/6nf9pniglhbor-oaPbkJEnrX

8

Plant Ontology Gene Ontology

Analysis of the genome sequence of the flowering plant Arabidopsis thalianaand The Arabidopsis Genome InitiativeNature 408, 796-815(14 December 2000)doi:10.1038/35048692

Annotated Data: the Arabidopsis thaliana Genome

3Sunday, November 11, 12

Page 4: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

The Arabidopsis Information Resource

http://www.arabidopsis.org/cgi-bin/bulk/go/getgo.pl 4Sunday, November 11, 12

Page 5: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Graph Summarization Example 1 5Sunday, November 11, 12

Page 6: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Graph Summarization Example 1 6

Thor, A., Anderson, P., Raschid, L., Navlakha, S., Saha, B., Khuller, S., Zhang, X.N.: Link prediction for annotation graphs using graph summarization. In: International Semantic Web Conference (ISWC). (2011)

Sunday, November 11, 12

Page 7: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Graph Summarization Example 2 7Sunday, November 11, 12

Page 8: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Graph Summarization Example 2 8Sunday, November 11, 12

Page 9: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Heuristics for Graph Summarization

• Similarity

• Annotation Links

• Neighbor Sets

9

Sunday, November 11, 12

Page 10: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Similarity Heuristic

Nodes in same cluster should

be similar

10Sunday, November 11, 12

Page 11: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Annotation Link Heuristic

Shared neighbor ➔ same cluster

Neighbor not shared ➔ different clusters

11Sunday, November 11, 12

Page 12: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Neighbor Sets Heuristic

Similar set of neighbors

➔ same cluster

b ca

b ca

b ca d

d e

d e12

Sunday, November 11, 12

Page 13: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Affinity Propagation

• one exemplar per cluster

• every node has to choose an exemplar

Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315 (2007) 972–976 13

Sunday, November 11, 12

Page 14: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Probabilistic Reasoningon Relational Data

• Fuzzy Logic [Lee, 1972]

• Undirected Graphical Models, FOL template language [Richardson et al, 2006]

• Probabilistic Ontologies [Costa et al, 2006]

14Lee, R.C.T.:Fuzzy logic and the resolution principle. J.ACM19(1)(1972)109–119

Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2) (2006) 107–136

Costa, P. C. G., and K. B. Laskey. “PR-OWL: A Framework for Probabilistic Ontologies.” Frontiers in Artificial Intelligence and Applications 150 (2006): 237.

Sunday, November 11, 12

Page 15: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Probabilistic Soft Logic (PSL)• Declarative language for probabilistic reasoning

• First order rules

• Soft truth values

• Similarities

• Set similarity

• Efficient inference

• http://psl.umiacs.umd.edu/

Input Database

Rules+

Probability Distribution over Interpretations

15Broecheler, M., Mihalkova, L., Getoor, L.: Probabilistic similarity logic. In: Conference on Uncertainty in Artificial Intelligence (UAI). (2010)

Sunday, November 11, 12

Page 16: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Graphs and Clusters in PSLPHOT1

cauline leaf

shoot apex

CRY2

vacuole

stomatal movement

response to water deprivation

PHOT1

cauline leaf

shoot apex

CRY2vacuole

stomatal movement

response to water deprivation

link(sa,p1)= 1link(sa,c2) = 1link(cl,p1) = 1

...link(c2,v) = 1link(c2,rw) = 1

similar(sa,cl) = .9similar(p1,c2) = .5similar(sm,v) = .1

...

exemplar(sa,cl)exemplar(cl,cl)exemplar(p1,p1)exemplar(c2,p1)exemplar(sm,v)exemplar(v,v)

exemplar(rq,rw)

16Sunday, November 11, 12

Page 17: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

GS Heuristics in PSL• Similarity Heuristic

1 : exemplar(A,B) -> similar(A,B)

A

B

17

Sunday, November 11, 12

Page 18: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

GS Heuristics in PSL• Annotation Link Heuristic

1 : link(A,B) & link(C,B) & exemplar(A,D) -> exemplar(C,D) 1 : link(A,B) & exemplar(A,C) & exemplar(D,C) -> link(C,B)

A

B

C

D

18

Sunday, November 11, 12

Page 19: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

GS Heuristics in PSL• Neighbor Sets Heuristic

1 : sameNeighbors({A.link}, {C.link}) & exemplar(A,B) -> exemplar(C,B)

B

A Y

C

Z

Y Z

19

Sunday, November 11, 12

Page 20: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

GS Heuristics in PSL• Affinity Propagation

1000 : exemplar(A,B) -> exemplar(B,B)

Functional: exemplar

20

Sunday, November 11, 12

Page 21: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

GS Heuristics in PSL• Similarity Heuristic

• Annotation Link Heuristic

• Neighbor Sets Heuristic

• Affinity Propagation1000 : exemplar(A,B) -> exemplar(B,B)

Functional: exemplar

1 : exemplar(A,B) -> similar(A,B)

1 : link(A,B) & link(C,B) & exemplar(A,D) -> exemplar(C,D) 1 : link(A,B) & exemplar(A,C) & exemplar(D,C) -> link(D,B)

1 : sameNeighbors({A.link}, {C.link}) & exemplar(A,B) -> exemplar(C,B)

21

Sunday, November 11, 12

Page 22: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Distance to Satisfactionrule satisfied ⇔

truth value of body ≤ truth value of head

in1(a,b) -> out1(a,b) distance to satisfaction

0.5 0.6 0

0.9 0.6 0.30.7 0.6 0.1

1.0 0.0 1.0

22Sunday, November 11, 12

Page 23: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Distance to Satisfactionrule satisfied ⇔

truth value of body ≤ truth value of head

out1(a,b) -> in1(a,b) distance to satisfaction

0.5 0.6 0

0.9 0.6 0.30.7 0.6 0.1

1.0 0.0 1.0

23Sunday, November 11, 12

Page 24: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Probabilistic Model

f(I) =1

Zexp[�

X

r2R

�r(dr(I))p]

Z =

Z

Iexp[�

X

r2R

�r(dr(I))p]

Set of rule groundings

Rule’s weightInterpretation

Rule’s distance to satisfaction given I∈{1,2}

Normalization constant

24Sunday, November 11, 12

Page 25: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

MPE Inference in PSL• Convex optimization problem

• Exact inference in polynomial time by transformation to second order cone program

• New solver based on consensus optimization linear time in practice [Bach et al, NIPS 12]

25Bach, S., Broecheler, M., Getoor, L., O'Leary, D.. "Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization" Advances in Neural Information Processing Systems (NIPS) - 2012

Sunday, November 11, 12

Page 26: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Evaluation

Sunday, November 11, 12

Page 27: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

$7�*�����

*2B�������

*2B�������$7�*�����

32B�������

$7�*�����

*2B�������

$7�*�����$7�*�����

32B�������

$7�*�����

$7�*�����

32B�������

32B�������

$7�*�����

32B�������

32B�������

$7�*�����32B�������

32B�������

$7�*�����*2B�������

$7�*�����

*2B�������

32B�������

32B�������

$7�*�����

$7�*�����

32B�������

*2B�������

$7�*�����

32B�������

*2B������� 32B�������

*2B�������

$7�*�����

32B�������

$7�*�����

32B�������

*2B�������

32B�������

32B�������

*2B������� 32B�������$7�*�����

32B�������

32B�������

32B�������

*2B�������

32B������� 32B�������

$7�*�����

32B�������

32B�������

*2B�������

32B�������

32B�������

32B�������

32B�������

32B�������

32B�������

32B�������

*2B�������

*2B�������

32B�������

32B�������

32B�������

*2B�������

*2B�������

32B�������

32B�������

*2B�������

32B�������

*2B�������

32B�������

32B�������

*2B�������

Input Graph

Annotation Links

GO Taxonomic Distance

27

Gene Sequence Similarity

PO Taxonomic Distance

Sunday, November 11, 12

Page 28: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

$7�*�����

*2B�������

*2B�������$7�*�����

32B�������

$7�*�����

*2B�������

$7�*�����$7�*�����

32B�������

$7�*�����

$7�*�����

32B�������

32B�������

$7�*�����

32B�������

32B�������

$7�*�����32B�������

32B�������

$7�*�����*2B�������

$7�*�����

*2B�������

32B�������

32B�������

$7�*�����

$7�*�����

32B�������

*2B�������

$7�*�����

32B�������

*2B������� 32B�������

*2B�������

$7�*�����

32B�������

$7�*�����

32B�������

*2B�������

32B�������

32B�������

*2B������� 32B�������$7�*�����

32B�������

32B�������

32B�������

*2B�������

32B������� 32B�������

$7�*�����

32B�������

32B�������

*2B�������

32B�������

32B�������

32B�������

32B�������

32B�������

32B�������

32B�������

*2B�������

*2B�������

32B�������

32B�������

32B�������

*2B�������

*2B�������

32B�������

32B�������

*2B�������

32B�������

*2B�������

32B�������

32B�������

*2B�������

Leave-one-out Link Prediction

1. Leave one out

2. Cluster

3. Predict missing link using clusters

28

Sunday, November 11, 12

Page 29: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Arabidopsis thalianaData Sets

DS1 DS2 DS3

Genes

PO Terms

GO Terms

PO-Gene

GO-Gene

10 10 18

53 48 40

44 31 19

255 255 218

157 157 92

29

Sunday, November 11, 12

Page 30: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Results from Combining Heuristics

0

0.125

0.25

0.375

0.5

DS1 DS2 DS3

Mea

n A

vera

ge P

reci

sion

LINK1 LINK2SIM1 SIM2SETS

Cluster Link Prediction

LINK1

LINK2

SIM1

SIM2

SETS

Similarity, Shared Neigh.

Similarity, Shared Neigh.

Similarity, Neigh. ¬Shared

Similarity, Neigh. ¬Shared

Similarity Similarity, Shared Neigh.

Similarity Similarity, Neigh. ¬Shared

Similarity,Neigh. Sets

Similarity, Neigh. ¬Shared

30

Sunday, November 11, 12

Page 31: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Results from Combining Heuristics

0

0.125

0.25

0.375

0.5

DS1 DS2 DS3

Mea

n A

vera

ge P

reci

sion

LINK1 LINK2SIM1 SIM2SETS

Cluster Link Prediction

LINK1

LINK2

SIM1

SIM2

SETS

Similarity, Shared Neigh.

Similarity, Shared Neigh.

Similarity, Neigh. ¬Shared

Similarity, Neigh. ¬Shared

Similarity Similarity, Shared Neigh.

Similarity Similarity, Neigh. ¬Shared

Similarity,Neigh. Sets

Similarity, Neigh. ¬Shared Random Guess: ~0.004

31

Sunday, November 11, 12

Page 32: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Results from Combining Heuristics

0

0.125

0.25

0.375

0.5

DS1 DS2 DS3

Mea

n A

vera

ge P

reci

sion

LINK1 LINK2SIM1 SIM2SETS

Cluster Link Prediction

LINK1

LINK2

SIM1

SIM2

SETS

Similarity, Shared Neigh.

Similarity, Shared Neigh.

Similarity, Neigh. ¬Shared

Similarity, Neigh. ¬Shared

Similarity Similarity, Shared Neigh.

Similarity Similarity, Neigh. ¬Shared

Similarity,Neigh. Sets

Similarity, Neigh. ¬Shared Random Guess: ~0.004

32

Sunday, November 11, 12

Page 33: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Results from Combining Heuristics

0

0.125

0.25

0.375

0.5

DS1 DS2 DS3

Mea

n A

vera

ge P

reci

sion

LINK1 LINK2SIM1 SIM2SETS

Cluster Link Prediction

LINK1

LINK2

SIM1

SIM2

SETS

Similarity, Shared Neigh.

Similarity, Shared Neigh.

Similarity, Neigh. ¬Shared

Similarity, Neigh. ¬Shared

Similarity Similarity, Shared Neigh.

Similarity Similarity, Neigh. ¬Shared

Similarity,Neigh. Sets

Similarity, Neigh. ¬Shared Random Guess: ~0.004

Baseline System33

Thor, A., Anderson, P., Raschid, L., Navlakha, S., Saha, B., Khuller, S., Zhang, X.N.: Link prediction for annotation graphs using graph summarization. In: International Semantic Web Conference (ISWC). (2011)

Sunday, November 11, 12

Page 34: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Results from Combining Heuristics

0

0.125

0.25

0.375

0.5

DS1 DS2 DS3

Mea

n A

vera

ge P

reci

sion

LINK1 LINK2SIM1 SIM2SETS

Cluster Link Prediction

LINK1

LINK2

SIM1

SIM2

SETS

Similarity, Shared Neigh.

Similarity, Shared Neigh.

Similarity, Neigh. ¬Shared

Similarity, Neigh. ¬Shared

Similarity Similarity, Shared Neigh.

Similarity Similarity, Neigh. ¬Shared

Similarity,Neigh. Sets

Similarity, Neigh. ¬Shared Random Guess: ~0.004

Baseline System34

Thor, A., Anderson, P., Raschid, L., Navlakha, S., Saha, B., Khuller, S., Zhang, X.N.: Link prediction for annotation graphs using graph summarization. In: International Semantic Web Conference (ISWC). (2011)

Sunday, November 11, 12

Page 35: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Conclusion

• Simple heuristics → competitive results

• Probabilistic Soft Logic for general reasoning over uncertainty

35

Sunday, November 11, 12

Page 36: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Thank you!http://psl.umiacs.umd.edu

Sunday, November 11, 12

Page 37: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Backup

Sunday, November 11, 12

Page 38: Graph Summarization in Annotated Data using Probabilistic ...c4i.gmu.edu/ursw/2012/files/...presentation.pdf · Hu et al. Nature Reviews Cancer 7, 23–34 (January 2007) | doi:10.1038

Distance to Satisfactiondr

(I) = max{0, I(rbody

)� I(rhead

)}using Lukasiewicz norms:

I(v1 ^ v2) = max{0, I(v1) + I(v2)� 1}I(v1 _ v2) = min{I(v1) + I(v2), 1}

I(¬l1) = 1� I(v1)

1 : link(a,b) & exemplar(a,a) & exemplar(c,a) -> link(c,b)

1 0.9 0.8 0

max{0,1+0.9+0.8−2}=0.7 d=0.71 0.5 0.3 0

max{0,1+0.5+0.3−2}=0 d=038

Sunday, November 11, 12