efficient multi-view maintenance in the social semantic web

1
Efficient Multi-View Maintenance in the Social Semantic Web Views on Social Networks Query = Subgraph matching query View = Query for which the answer set is maintained as the social network database is updated 1 Multiple Views 2 Merging Views 3 Example Merge 4 Merge Optimality 5 Optimal Merge 6 7 8 9 Matthias Broecheler, Andrea Pugliese, and VS Subrahmanian On large social networks, multiple views are often maintained concurrently. Maintaining multiple views is very expensive, in particular for rapidly changing databases. e.g. Twitter has over 340 million tweets / day IDEA: Very often, view queries have overlapping subgraph structures (bold arcs). If we can overlay the different view queries such that these shared substructures can be matched jointly rather than independently, we can save a lot of time. Social network updates are edge insertions (removals), hence graph merging has to center around the inserted edge type. Developed subgraph matching algorithm that can process merged view queries efficiently. Many possible ways to merge query graphs. We want high connected overlap which results in most savings at update time. Define merged view score as the sum of edge overlaps. Finding the optimal view wrt the merged view score is NP–hard. Our greedy view merging algorithm finds near optimal views in practice. Experiments Compared our merged multi-view maintenance algorithm against standard independent view maintenance. 6 real world social network datasets with up to 540 million edges. Randomly generated 12,000 queries with varying degree of overlap and averaged results over 750 trials. All algorithm implemented in Java on top of the COSI graph database middleware. Performance improvement of the Multi-View Maintenance algorithm on 6 different social networks Applications Maintaining multiple views jointly as a merged view leads to significant improvements. 477% faster than standard view maintenance Applications include: Monitoring social networks Fraud, security applications, alerts Business Analytics Knowledge Discovery Caching frequently asked queries ?v4 ?v3 Health Care ?a1 ?a2 ?v7 ?v6 Business Analytics ?v5 topic topic references references publish tweet associated follows publish expert tweet topic follows ?v13 ?v11 ?v12 topic expert associated tweet references publish topic comments ?v9 ?v8 publish associated Edges mapped by 1 , 2 and 3 Edges mapped only by 2 Edges mapped only by 3 Edges mapped only by 1 LEGEND ?v4 ?v3 Health Care ?a1 ?a2 ?v6 Business Analytics ?v5 topic topic references references publish tweet associated follows publish expert tweet expert topic comments ?v9 ?v11 publish ?v10 publish ?v8 references topic associated Edges mapped by 1 , 2 and 3 Edges mapped only by 2 Edges mapped only by 3 Edges mapped only by 1 LEGEND ?v7 ?v16 ?v14 ?v15 topic topic references publish associated follows publish expert tweet ?v13 ?v12 70.0% 75.0% 80.0% 85.0% 90.0% 95.0% 100.0% 0% 100% 200% 300% 400% 500% 600% 700% 800% 900% Physics Enron Youtube Flickr LiveJournal Orkut Outperforming Improvement Mul2 View Maintenance Performance ?person ?article1 Health Care ?expert ?msg1 ?other ?article2 Business Analytics ?msg2 topic topic references references publish tweet associated follows publish expert tweet Health Care ?expert ?msg topic references associated comments expert ?doc ?person ?author ?article publish publish tweet topic

Upload: matthias-broecheler

Post on 06-Jul-2015

3.648 views

Category:

Documents


1 download

DESCRIPTION

This poster describes an efficient approach to maintaining multiple views on large, evolving social networks. Abstract: The Social Semantic Web (SSW) refers to the mix of RDF data in web content, and social network data associated with those who posted that content. Applications to monitor the SSW are becoming increasingly popular. For instance, marketers want to look for semantic patterns relating to the content of tweets and Facebook posts relating to their products. Such applications allow multiple users to specify patterns of interest, and monitor them in real-time as new data gets added to the web or to a social network. In this paper, we develop the concept of SSW view servers in which all of these types of applications can be simultaneously monitored from such servers. The patterns of interest are views. We show that a given set of views can be compiled in multiple possible ways to take advantage of common substructures, and de fine the concept of an optimal merge. We develop a very fast MultiView algorithm that scalably and efficiently maintains multiple subgraph views. We show that our algorithm is correct, study its complexity, and experimentally demonstrate that our algorithm can scalably handle updates to hundreds of views on real-world SSW databases with up to 540M edges.

TRANSCRIPT

Page 1: Efficient Multi-View Maintenance in the Social Semantic Web

Efficient Multi-View Maintenance in the Social Semantic Web

Views on Social Networks

  Query = Subgraph matching query   View = Query for which the answer set is maintained as the social network database is updated

1 Multiple Views

2 Merging Views

3

Example Merge

4 Merge Optimality

5 Optimal Merge

6

7 8 9

Matthias Broecheler, Andrea Pugliese, and VS Subrahmanian

•  On large social networks, multiple views are often maintained concurrently. •  Maintaining multiple views is very expensive, in particular for rapidly changing databases. •  e.g. Twitter has over 340 million tweets / day

IDEA: Very often, view queries have overlapping subgraph structures (bold arcs). If we can overlay the different view queries such that these shared substructures can be matched jointly rather than independently, we can save a lot of time.   Social network updates are edge insertions (removals), hence graph merging has to center around the inserted edge type.   Developed subgraph matching algorithm that can process merged view queries efficiently.

  Many possible ways to merge query graphs. We want high connected overlap which results in most savings at update time.   Define merged view score as the sum of edge

overlaps.   Finding the optimal view wrt the merged view score is NP–hard.

  Our greedy view merging algorithm finds near optimal views in practice.

Experiments

Compared our merged multi-view maintenance algorithm against standard independent view maintenance.   6 real world social network datasets with up to 540 million edges.

  Randomly generated 12,000 queries with varying degree of overlap and averaged results over 750 trials.

  All algorithm implemented in Java on top of the COSI graph database middleware.

Performance improvement of the Multi-View Maintenance algorithm on 6 different social networks

Applications

Maintaining multiple views jointly as a merged view leads to significant improvements.   477% faster than standard view maintenance

Applications include:   Monitoring social networks   Fraud, security applications, alerts

  Business Analytics   Knowledge Discovery   Caching frequently asked queries

?v4

?v3 Health Care

?a1

?a2

?v7

?v6

Business Analytics

?v5

topic topic

references

references publish

tweet associated follows

publish

exp

ert

tweet

topic

follows

?v13

?v11

?v12 topic

exp

ert

associated

tweet

references

publish

topic comments

?v9 ?v8 publish

asso

ciat

ed

Edges mapped by 1, 2 and 3

Edges mapped only by 2

Edges mapped only by 3

Edges mapped only by 1

LEGEND

?v4

?v3 Health Care

?a1

?a2 ?v6

Business Analytics

?v5

topic

topic

references

references

publish

tweet associated

follows

publish

exp

ert

tweet

exp

ert

topic comments

?v9 ?v11 publish

?v10

publish ?v8

references topic

associated

Edges mapped by �1, �2 and �3

Edges mapped only by �2 Edges mapped only by �3

Edges mapped only by �1

LEGEND

?v7

?v16

?v14

?v15

topic

topic

references

publish

associated follows

publish

exp

ert

tweet

?v13 ?v12

70.0%%

75.0%%

80.0%%

85.0%%

90.0%%

95.0%%

100.0%%

0%%100%%200%%300%%400%%500%%600%%700%%800%%900%%

Physics%

Enron%

Youtube%

Flickr%

LiveJournal%

Orkut%

Outpe

rforming-

Improvem

ent-

Mul2-View-Maintenance-Performance-

?person

?article1 Health Care

?expert

?msg1

?other

?article2

Business Analytics

?msg2

topic topic

references

references publish

tweet associated follows

publish expert

tweet

Health Care

?expert ?msg

topic

references

associated

comments

expert

?doc ?person

?author ?article publish

publish

tweet

topic