bidirectional expansion for keyword search on graph databases varun kacholia shashank pandit soumen...
TRANSCRIPT
![Page 1: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/1.jpg)
Bidirectional Expansion for Keyword Search
on Graph Databases
http://www.cse.iitb.ac.in/banks/
Varun Kacholia Shashank PanditSoumen Chakrabarti S. SudarshanRushi Desai Hrishikesh Karambelkar
![Page 2: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/2.jpg)
Keyword Search on Graph Representation of Data
Keyword search on relational, XML, HTML, etc. data BANKS, Discover, DBXplorer, XRank,
etc. Need to find a (closely) connected
set of nodes that together match all given keywords
Focus of our work Search algorithms to find connections
between nodes
![Page 3: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/3.jpg)
Outline
Data, Query and Response Models Backward Search Algorithm Bidirectional Search Algorithm Experiments Related Work Conclusions
![Page 4: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/4.jpg)
Graph Data Model Data modeled as a directed weighted
graph: BANKS [ICDE’02] Can model relational, XML, HTML, etc. data
E.g., DBLP database Node = tuple Edge = foreign key reference
Multi-Query Optimization
Sudarshan Prasan Roy
writes
author
paper
Soumen
BANKS: Keyword search…
![Page 5: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/5.jpg)
Graph Data Model (2) E.g., XML data <proceedings> <paper id=“1”> <title>Databases</title> </paper> <paper id=“2”> <title>Keyword Search</title> <cite ref=“1”>Databases</cite> </paper> </proceedings>
titletitle
proceedings
paper (@id = 1)
paper (@id = 2)
cite
![Page 6: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/6.jpg)
Response Model
Response: Minimal, rooted tree connecting keyword nodes Undirected: Discover, DBXplorer Directed: BANKS
Multi-Query Optimization
Sudarshan Prasan Roy
writes writes
author author
paper
E.g., Sudarshan Roy
![Page 7: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/7.jpg)
Response Ranking Edge Score = EA
Smaller tree => higher score E.g., BANKS: EA = 1/ ( edge weights)
Node Score = NA
Measure of authority of nodes in tree E.g., BANKS: NA = (leaf and root node
authorities)
Overall score = f (EA, NA) E.g., BANKS: f (EA, NA)EA . NA
![Page 8: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/8.jpg)
Finding Answer Trees Backward Expanding Search: BANKS [ICDE02]
Intuition: travel backwards from keyword nodes till you hit a common node
Sudarshan Prasan Royauthors
MultiQuery Optimizationpaper
Query: sudarshan roy
writes
![Page 9: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/9.jpg)
Backward Search: Algorithm
Algorithm Run concurrent single source shortest
path iterators from each node matching a keyword
Traverse the graph edges in reverse direction Output next nearest node on each get-next()
call Do best-first search across iterators Output node if in the intersection of sets
of nodes reached from each keyword
![Page 10: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/10.jpg)
Backward Search: Limitations
Wasteful exploration of graph: Frequently occurring keywords “Hub” nodes in the graph (high in-
degree)
…
Database
Shashank Sudarshan
… author
paper
writes
Schema Legend
“Shashank Sudarshan Database”
![Page 11: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/11.jpg)
Bidirectional Search: Motivation
![Page 12: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/12.jpg)
Bidir Search: Intuition
First cut solution: Don’t go backward if keyword matches
many nodes Don’t go backward if node points to a
hub Instead explore forward from other
keywords
![Page 13: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/13.jpg)
Bidir Search: Example
…
…
author
paper
writes
Schema Legend
“Shashank Sudarshan Database”
Database
Shashank Sudarshan
…
![Page 14: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/14.jpg)
Bidir Search: Issues
What should threshold for not expanding be? Our solution: prioritize expansion of
nodes based on spreading activation to penalize frequent keywords and bushy
trees
How to manage exploration in both directions?
![Page 15: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/15.jpg)
Bidir Search: Spreading Activation
Spreading Activation Node with highest activation explored
first Every node given an initial activation
Gives low activation to frequently occurring keywords
“John”
1/5 1/5 1/5 1/5 1/5
![Page 16: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/16.jpg)
Bidir Search: Spreading Activation
Spreading Activation Node with highest activation explored first
Activation spread to neighbors (μ = 0.3)
Gives low activation to neighbors of hubs
0.3 x 1/5
0.7 x 1/5 x 1/4
0.7 x 1/5 x 1/4
0.7 x 1/5 x 1/4
1/5
0
0
0
0 0.7 x 1/5 x 1/4
1
1
1
1
![Page 17: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/17.jpg)
How to manage exploration in both directions?
Single backward iterator + single forward iterator w/ suitable datastructures E.g., to keep track of parents of nodes
Details in full paper
Bidir Search: Iterators
…
1[0,∞] [∞,0]
[1,∞]
[∞,∞]
[∞,1][∞,1]
“A” “B”
[2,3 ∞] 7
3
2
4 5
6 [∞,∞ 2][Dist from “A”, Dist from “B”]
[2,∞]
![Page 18: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/18.jpg)
Bidir Search: Algorithm Algorithm
Activate matching nodes; insert into backward iterator
while (iterators are not empty) Choose iterator for expansion in best-first manner Explore node with highest activation Spread activation to neighbors Update path weights (and other datastructures)
Propagate values to ancestors if necessary Insert nodes explored in the backward direction into
the forward iterator /* for future forward exploration */
Stop when top-k results are produced
![Page 19: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/19.jpg)
Bidir Search: top-k results
Results need not be generated “in-order”
Naïve solution Store results in an intermediate heap Output top k results after mk total results
have been generated (m ~ 10) Can do better
Compute upper bound on score of next result; output answers with a higher score
Similar to NRA algorithm (Fagin et al., PODS’01)
![Page 20: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/20.jpg)
Experiments Datasets
DBLP, IMDB ~ 2 million nodes, 9 million edges US Patent DB ~ 4 million nodes, 15 million edges
Workload Keywords randomly picked from results of SQL join
statements Search algorithms
MI-Bkwd: original backward search Iterator for every node matching a keyword
SI-Bkwd: backward search with single backward iterator Bidirec: bidirectional search
Time taken/nodes explored Measured when 10th answer is generated (or last
answer if #answers < 10) Origin size
#nodes matched by keywords in the query
![Page 21: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/21.jpg)
Experiments (2) MI-Bkwd versus SI-Bkwd
SI-Bkwd gain increases with origin size, # keywords
![Page 22: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/22.jpg)
Experiments (3) SI-Bkwd versus Bidirec
Bidirec gain increases with origin size, # keywords
![Page 23: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/23.jpg)
Experiments (4) Precision/Recall experiments
Relevant answers are well-defined; can be generated through SQL statements
Both MI-Backward and Bidirectional show similar performance
Recall ~ 100% Precision ~ 100% at near full recall Few irrelevant answers produced before
generating all relevant answers Bidirectional runs faster, yet minimal
loss of relevant results!
![Page 24: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/24.jpg)
Experiments (5) Comparison with Sparse: Hristidis et al. [VLDB’03]
Generate join expressions leading to query results
Use DB-provided scores for ranking tuples and aggregate them to rank answer trees
For top-k results: automatically determine required number of join expressions
Sparse-LB Manually generate required join expressions Sparse needs to do at least this much (and usually a
lot more!) Bidirectional versus Sparse-LB
Bidirectional outperforms by a factor of ~ 3 (esp. when #joins is large)
![Page 25: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/25.jpg)
Experiments (6) SI-Bkwd versus Bidirec: by origin size
Bidirec gains more with unbalanced origin sizes
A = (T,S,S,S)
B = (M,M,M,M)
C = (M,L,L,L)
D = (M,M,L,L)
E = (T,L,L,L)
F = (T,S,M,L)
G = (T,M,L,L)
H = (T,T,T,L)
![Page 26: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/26.jpg)
Discussion Bidirectional search as dynamic per-
tuple join ordering Related work in this area: Eddies
Bidirectional search Schema-less Prioritization based on activation instead of
selectivity Generate answers in relevance order
![Page 27: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/27.jpg)
Related Work Keyword querying on relational data:
Discover (UCSD), DBExplorer (Microsoft) Use SQL generation, without in-memory
data structures Issues: generate join plans, re-use
common sub-expressions, etc. Keyword querying on XML
XRank (Cornell), Schema-Free XQuery (Michigan), …
Tree model is too limited ObjectRank
![Page 28: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/28.jpg)
Conclusions Graph model
Convenient common denominator representation
Schema-free querying leads to graph search Purely backward strategy inadequate Bidirectional search with spreading
activation performs much better Dynamically choose join order on per-
tuple basis
![Page 29: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/29.jpg)
Thank You!
Questions??
![Page 30: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/30.jpg)
Future of Keyword Search in DBs
Next generation of intelligent search will require context information E.g. search email, files, calendar, .. Information integration will be important Graph structured data will be a key
component Is there a killer app?
Deep web? Display of answers
Users don’t want to see schema details Can we leverage off existing (Web) apps?
![Page 31: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/31.jpg)
BANKS Future Work Applications of BANKS
Soumen Chakrabarti, Sunita Sarawagi and students Exploit BANKS to integrate different sources of
data Extract information, Infer soft links
BANKS for personal information management SPIN: Search Personal Information Networks
Ongoing/future work on BANKS: More sysadmin/user control on ranking
One size does not fit all BANKS provides infrastructure
Characterize bidirectional search better And find other applications
Security
![Page 32: Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649dd25503460f94ac917c/html5/thumbnails/32.jpg)
Bidir Search: top-k results (2)
Compute upper bound on score of next result; output answers with a higher score
Computing the bound mi = minimum path length explored backward
from keyword i unseen answer node: 1/(m1 + m2 + … + mn ) visited answer node: suppose reached from
first x keywords with distance di 1/[(d1 + d2 + … + dx ) + (mx+1 + mx+2 + … + mn )]
combine this with max node prestige We simply use 1/(m1 + m2 + … + mn )!
Experiments show no significant loss in using this heuristic