solr 6.0 graph query overview

25
Solr 6.0 Graph Query Overview Kevin Watters KMW Technology [email protected] http://www.kmwllc.com / 03/29/2016

Upload: kevin-watters

Post on 07-Jan-2017

1.039 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Solr 6.0 Graph Query Overview

Solr 6.0 Graph Query Overview

Kevin Watters KMW [email protected]://www.kmwllc.com/03/29/2016

Page 2: Solr 6.0 Graph Query Overview

KMW Technology OverviewBoston based software consulting and

professional services organization.Founded in 2010.Seven consultants with deep industry

experience.Boutique firm specializing in Search

and Big Data technologies.Custom Connectors, Pipelines, Search,

Analytics, and UI development.

Page 3: Solr 6.0 Graph Query Overview

Search, Join, vs GraphWhich query should I use?Search is for flat data, no relationships

◦ Data often de-normalized, updates require large amounts of re-indexing potentially.

Join is for one level of relationships◦ Data is normalized, but for more than 2 tables

involved, join queries must be nested.Graph is for arbitrary depth/levels of

relationships.◦ Data can be completely normalized, arbitrary

numbers of tables can be joined together.A one level hop on a graph is roughly equivalent

to a join query.

Page 4: Solr 6.0 Graph Query Overview

What is a Graph?A generic representation of all data models. “One data model to rule them all”!

G = <V,E> ?!?!

Vertices/Nodes◦Can have properties as key value pairs.

Edges◦Can have properties as key value pairs

Page 5: Solr 6.0 Graph Query Overview

Graph TraversalThere are many graph traversal / exploration algorithms. DFS, BFS, A*, Alpha–beta, etc…

Solr graph query implements “BFS”Breadth-first search, each hop expands the “Frontier” of the graph. It explores all current edges in a single step, also known as a “hop”

Page 6: Solr 6.0 Graph Query Overview

Key Features and Design Goals“Graph is a Filter on top of your data” -someone

Designed for large scale and large number of edges and very deep traversals.

Limited memory usage for traversalCycle detection for “free”Highly cacheableSupport multiValued fields for nodes and/or edgesSupport filters during the traversalFollow Every Edge! No edge left behind!Works with Facets & Facet Queries!

Page 7: Solr 6.0 Graph Query Overview

A Word about Memory UsageOne bit set to rule them all!BitSet provides cycle detection implicitly.

(Have I been here before?)BitSet is equal to the size of the index.100 Million doc index only uses about 12

MB per query! (Same size as 1 filter cache entry!)

Additional bitsets may be used during query execution depending on query params. (leaf nodes and root nodes bitsets)

Page 8: Solr 6.0 Graph Query Overview

Graph Query Parser Syntax

Parameter Default Descriptionfrom field containing the node idto Field contaning the edge id(s)

maxDepth -1The number of hops to traverse from the root of the graph.  -1 means traverse until all edges and documents have been collected. maxDepth=1 is similar behavior to a JOIN.

traversalFilter null arbitrary query string to apply at each hop of the traversal

returnRoot true true|false – indication of if the documents matching the root query should be returned.

leafNodesOnly false true|false – indication to return only documents in the result set that do not have a value in the “to” field.

useAutn True Performance trade off based on use case. Mileage may vary.

Uses Solr’s query parser plugin and “local params” syntax{!graph param=”value” … }

Page 9: Solr 6.0 Graph Query Overview

Princeton WordnetPrinceton Wordnet has an ontology for many of the words in the English language. These relationships contain hierarchies of words that represent a more general and a more specific class of relatonships. https://wordnet.princeton.edu/Words have a “sense”, or meaning.Hypernym is a more specific related word.Hyponem is a more general related word.

◦ Jaguar is a type of Cat◦ Large Cat is a type of Animal

Intersections of this hierachy can answer questions: “Is a jaguar an animal?”

Page 10: Solr 6.0 Graph Query Overview

Wordnet Hypernym TraversalStart traversing from the word sense “jaguar” up the hypernym graph 9 levels.+{!graph from="synset_id" to="hypernym_id" maxDepth=9}sense_lemma:jaguar

Page 11: Solr 6.0 Graph Query Overview

Wordnet Graph IntersectionsIs a jaguar an animal? Query for an

intersection between the two graphs.

If a graph intersection exists, the answer is yes!

Page 12: Solr 6.0 Graph Query Overview

OpenCV, Video RecognitionImagine indexing each frame of

video from security cameras. Pass each frame of video through OpenCV for object recognition & face recognition.

Each frame has a frame number of it’s frame and the previous frame.

Search for object/face “A” detected, followed by object/face “B” detected, across all of your video streams.

Page 13: Solr 6.0 Graph Query Overview

Users , Items and ActionsModel your browsing/purchase history as

◦Users (have an ID)◦ Items (have an ID, metadata, category, etc)◦Actions (link between user and Items, such as

rating, purchase, like/dislike)User -> Action -> Item -> Action -> User …Use Graph + maxDepth to get from a user to an item. maxDepth = 2… gets from a user to an Item. maxDepth = 4 .. Gets from one user to a new set of users, and on and on.

Page 14: Solr 6.0 Graph Query Overview

Actions occur over timeThese events can’t easily be

aggregated or flattened onto a record.

Model this as a “person” record, with a set of “action” records.

Each action record has the id of the “previous” action.

Search for an action, graph traverse based on person id to another action, then finally to the person record.

Page 15: Solr 6.0 Graph Query Overview

Find similar usersGraph traversal from a user (or

set of users) through their actions to items they like, to find similar users, and out to items they like.

Now, exclude the original starting set

“returnRoot=false”

Page 16: Solr 6.0 Graph Query Overview

Graph Query For SecurityGraph queries are elegant and

simple to use for traversing security hierarchies such as LDAP and AD

Custom security models that are hierarchical or folder based in nature.

Page 17: Solr 6.0 Graph Query Overview

Example Company with Security Model

Page 18: Solr 6.0 Graph Query Overview

Document/Security Model within the Solr Index

Page 19: Solr 6.0 Graph Query Overview

Graph Traversal for User 1

Page 20: Solr 6.0 Graph Query Overview

Graph Traversal for User 2

Page 21: Solr 6.0 Graph Query Overview

Security Query Single security query term to traverse the entire

graph{!graph from=“node_id” to=“edge_ids”

returnOnlyLeaf=“true”}id:user_1 The query is applied as a FilterQuery to the query

request, normal query is user for filtering against documents

Page 22: Solr 6.0 Graph Query Overview

FoaFFriend of a Friend of a Friend of a Friend…

2 ways to model in the index.Multi-valued “friendid” field that points to other

person records.◦ More efficient and faster search.◦ Filter traversal based on metadata on the person

record.Single value field and on a document that

represents the link/edge between two person records.◦ More flexible slower search. ◦ Can filter edges with metadata about the edge record..

Page 23: Solr 6.0 Graph Query Overview

Graph Analytics via FacetingWhat do my friend’s friends like that live in Boston?

Identify a graph/ dataset with a graph query to identify the people records.

Use facets to generate analytics on the result set based on the values in the person record “like” field.

Use drill down to understand characteristics of different demographics/cohorts.

Get counts at various levels using maxDepth graph queries as facet queries.

Page 24: Solr 6.0 Graph Query Overview

What next?Edge weights & Relevancy

◦ Based on tf/idf or bm25?◦ Based on numerical field values (min/max/sum/avg

weight application)?Min distance computationBetter support for D3.js and other Visualization

toolsDriving directions?Distributed Traversal via Kafka frontier query

brokerSparkRDD Support? GraphX?minDepth parameter? Only return records that

are at least N hops away?