algorithmics and applications of tree and graph searching dennis shasha, jason t. l. wang, and...
TRANSCRIPT
![Page 1: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/1.jpg)
Algorithmics and Applications of Tree and Graph Searching
Dennis Shasha, Jason T. L. Wang, and Rosalba Dennis Shasha, Jason T. L. Wang, and Rosalba GiugnoGiugno
Presenters:Presenters:Jerod Watson & Christan GrantJerod Watson & Christan Grant
![Page 2: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/2.jpg)
Introduction Searching in Trees
Approximate Containment Queries Path-Only Searches Extension to Trees
Searching in Graphs Keygraph Searching in Graph DBs GraphGrep Subgraph Matching
Conclusion
![Page 3: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/3.jpg)
Introduction
Modern search engines Keyword-based queries Impressive speed
Several research efforts have attempted to generalize keyword search to keytree and keygraph searching
![Page 4: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/4.jpg)
XQuery
![Page 5: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/5.jpg)
AQUA Query
![Page 6: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/6.jpg)
Query expressed as a tree pattern, termed “query tree”
DB can be represented as single tree or as set of trees
Each tree could be ordered or unordered
Queries often concerned with the parent-child, ancestor-descendant”, or path relationship among nodes
Queries can be expressed by containment mapping.
![Page 7: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/7.jpg)
Query tree may contain fixed length don’t cares (FLDCs) ex. “?”
Query tree may contain variable length don’t cares (VLDCs) ex. “*”
This class of queries referred to as approximate containment (AC) queries
![Page 8: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/8.jpg)
Path-Only Searches
Many AC queries are concerned with paths only. Ex. “Find the descendants of Mary who
is a child of John”
XISS is an indexing and querying system designed to support regular path expressions
![Page 9: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/9.jpg)
Extension to Trees
Pathfix algorithm Phase 1: Encodes each root-to-leaf path
of every data tree into a suffix array DB Phase 2: Compares the query tree Q
with each data tree D in the DB allowing a difference of DIFF
![Page 10: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/10.jpg)
Handling Don’t Cares Partition query into connected subtrees
having don’t cares Match each of those don’t care free
subtrees with data trees in the DB For the matched subtrees that belong to
the same data tree, determine whether they combine to match the query based on the matching semantics of the don’t cares.
Filtering
![Page 11: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/11.jpg)
Implementation ATreeGrep
![Page 12: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/12.jpg)
![Page 13: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/13.jpg)
GraphsGraphs
![Page 14: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/14.jpg)
Graphs
Abstract data type of elements (nodes or vertices) interconnected by edges.
A graph is a specialized tree in which there is no constraint on the number of paths is possible from a node
No root Graph may contain cycles
![Page 15: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/15.jpg)
Keygraph Searching
Searching for a particular graph or order of elements inside of a large graph (i.e. internet)
Searching for a particular graph or structure among several graphs (i.e. chemical elements)
Use indexing to reduce complexity
![Page 16: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/16.jpg)
Keygraph Searching
Three basic steps1. Reduce the search space by filtering2. Formulate query into simple
structures3. Match
![Page 17: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/17.jpg)
Keygraph Searching (survey) A* algorithm GraphDB Daylight Lore
![Page 18: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/18.jpg)
A*
Seminal work by Nilson (1980) Route finding algorithm that keeps track of its
visited nodes and the distance it has traveled. Applications:
Protein databases (discovery and search) Image databases Chinese character databases CAD circuit data and software source code
![Page 19: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/19.jpg)
A*
Pseudocode function A*(start,goal) var closed := the empty set var q := make_queue(path(start)) while q is not empty var p := remove_first(q) var x := the last node of p if x in closed continue if x = goal return p add x to closed foreach y in successors(x) enqueue(q, p, y) return failure
![Page 20: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/20.jpg)
GraphDB
Specifies a data model and query model.
1. Queries are in the form of regular expressions
2. Nodes are classes representing data objects
3. Edges are classes to store paths in the database
4. Path classes are and indexing data structures are used to index database
Provides graph and search operations to:
Shortest path between two nodes
Subgraphs from a starting node and range
![Page 21: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/21.jpg)
GraphDB
![Page 22: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/22.jpg)
Daylight
"Provide the best known computer algorithms for chemical information processing to those who need them."
Uses finger printing to index/prune
![Page 23: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/23.jpg)
ChemDBChemDB(Contains 6.5 million unique structures or subgraphs)
![Page 24: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/24.jpg)
Lore
Database management system for XML Modeled using rooted labeled subgraph Indexed in four ways for fast regular
expression use Vindex, Tindex, Lindex, Pindex(Data Guide)
![Page 25: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/25.jpg)
Lore
1) Vindex: For each edge labeled l, all nodes are index with incomming edges labeled l and some unique atomic value that satisfy some condition.
2) Tindex: A text index for all nodes with l-labeled edges a with a string of specific values containing specific words
3) Lindex: Link index to index nodes with outgoing l-labled edges
4) Pindex (DataGuide): indexes all nodes reachable from root through labled path.
The DataGuide is used by all queries from root. Other queries traverse paths using indexs(1-3), pruning
what is not a match.
![Page 26: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/26.jpg)
Tindex (1999)
A Data structure to index semistructured database nodes that are reachable from several regular path expressions
T-index may be more efficient than P-index because it relaxes some constraints
Reportedly in graph of size 1500 T-index is 13% of database
![Page 27: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/27.jpg)
GraphGrep
Uses variable length paths (cyclic or acyclic) to index DB. This provides for efficient filtering.
Nodes have ids (numbers) and labels (letters).
![Page 28: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/28.jpg)
GraphGrep
Index Construction1. Choose an lp max indexing length
2. Create “path-representation”3. Create fingerprint
![Page 29: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/29.jpg)
GraphGrep
Filtering the Database1. Query graph is parsed and a
fingerprint built2. Fingerprint are compared
1. If a graph has at least one value in its fingerprint that is less than the query fingerprint it is discarded.
2. Remaining graphs may contain > 1 sub graphs
![Page 30: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/30.jpg)
GraphGrep
Filtering the Database Takes linear time to the size of the
database But discards 99% of database!!!
![Page 31: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/31.jpg)
GraphGrep
Finding Subgraphs Matching with Queries Query tree depth first traversal
branches are decomposed into sequences of overlapping label-paths (patterns)
![Page 32: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/32.jpg)
GraphGrep
Overlaps1. Last node in a patters coincides with
first node of next pattern (e.g. ABCB (lp = 3) ABC CB)
2. If a node has branches, it is included in the first pattern of every branch
3. The first node in a cycle is visited twice
![Page 33: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/33.jpg)
GraphGrep
Matching Example1. Select the set of paths2. Combine lists with constraints3. Remove lists with equal id nodes in
non overlapping positions
![Page 34: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/34.jpg)
GraphGrep
Techniques for Queries with Wildcards Consider the parts of the query graph that
is between wild cards (like pathfix) The cartesian product of the components
that match are valid. An entry in the cartesian product is a valid
path (length = wildcards) between nodes.
![Page 35: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/35.jpg)
GraphGrep
1 GHz pentium III NCI databases (1,000 – 16,000
nodes) Average 20 nodes in db (max 270
nodes) Queries 13-189 nodes Lp = 4 and 10
![Page 36: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/36.jpg)
GraphGrep
Linear in size of DB Different lp influence running time
![Page 37: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/37.jpg)
Conclusions / Questions
Searching in Trees Introduces ATreeGrep
Searching in GraphsIntroduces GraphGrep
![Page 38: Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d8d5503460f94a75ba7/html5/thumbnails/38.jpg)
Thanks to:
God Class Wikipedia Various other Googled sources