how to build an awesome search app - marklogic · data categorization, envelope structure query...
TRANSCRIPT
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
HOW TO BUILD AN AWESOME SEARCH APP Stu McLean, Ph.D., Principal Consultant, Public Sector - Civilian, MarkLogic Ganesh Vaideeswaran, Senior Director, Development, MarkLogic
SLIDE: 2
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Agenda Components of a search application
Deep dive
SLIDE: 3
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 5
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 6
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 7
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 8
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 9
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 10
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 11
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 12
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 13
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 14
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 15
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 16
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 17
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Awesome Search for the Enterprise
WHAT MARKLOGIC PROVIDES
360° VIEW
SLIDE: 19
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
NBCUniversal – SNL40 App
Smart Content
Enriched and targeted content
Constantly tuned recommendations
> 5,550 sketches
SLIDE: 20
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Talent Kristen Wiig
Acted in
Episode 4 Anne Hathaway and
Killers
Part of Played
Character Maharelle Sister
Season 34
Segment The Lawrence Welk Show
Aired on
Date 10/4/08
Era
Acted in
Includes
Part of Has
Characteristic Tiny hands
NBCUniversal – SNL40 App
Smart Content: Enriched and targeted content Constantly tuned recommendations
SLIDE: 21
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Fairfax County – Mapping Crime Data
Data Variety
500,000 GIS data points
Service Calls
Source Variety
800 GIS layers
Structured & Unstructured
SLIDE: 23
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Mitchell1 – Streamlined Auto Repair Information
MITCHELL1 PRODEMAND
REPAIR INFO
EXPERT ADVICE
REPAIR ORDERS
Structured & Unstructured Data
Sophisticated Search
Multi-device Delivery
SLIDE: 25
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Needs for an Awesome Search App
Relevance
Snippets
Highlight
Auto Complete
Spell Suggest
Synonyms
QUERY
Structured & Unstructured
Multi-Model approach
DATA
Alerts
Transactions
Security
Real-time
Languages
Configurability
Development Platform
PLATFORM
Context
Presentation
SLIDE: 26
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Aboutness – The Key to Successful Search What is this document about?
What is this submitted query about?
What is the user’s activity about?
What are the responses about?
SLIDE: 27
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data What is this document about?
SLIDE: 28
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
SLIDE: 29
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
Query What is this query about?
SLIDE: 30
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
Query
SLIDE: 31
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
Query
Context What is the user’s activity about?
SLIDE: 32
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
Query
Context
SLIDE: 33
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
Query
Context
Results What are the responses about?
SLIDE: 34
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in MarkLogic
Data
Query
Context
Results
SLIDE: 35
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in an Awesome Search App Data
Categorization, Envelope Structure
Query
Query as a Dialogue
Context
Building an Information Resource
Results
Relevant Response
SLIDE: 36
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in an Awesome Search App Data Categorization, Envelope Structure
Query
Query as a Dialogue
Context
Building an Information Resource
Results
Relevant Response
SLIDE: 37
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Domain Knowledge ‒ Tools for Establishing Aboutness Taxonomies – vocabulary of the organization
Ontologies – Tie your documents to relevant data and documents
Thesaurus
Organizational units, people, products
Public vocabularies – Geographic, DBPedia
SLIDE: 38
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Terms to look for in your documents Taxonomies
SLIDE: 39
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Taxonomies and Reverse Queries Categorization ‒ Finding Aboutness
Public taxonomies
Getty – art and architecture
World Bank – humanitarian projects
Library of Congress – subject headings
National Library of Medicine – medical headings
Domain taxonomies
Best Buy
SLIDE: 40
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Where does Aboutness reside?
title:
abstract:
glossary:
Body
Expert searchers compose carefully crafted searches with domain knowledge
The rest of us hope for the best
Categorizing Your Documents
SLIDE: 41
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Turn a Taxonomy Entry into a Query <cts:or-query xmlns:cts="http://marklogic.com/cts"> <cts:near-query distance="5"> <cts:near-query distance="5"> <cts:element-word-query weight="10"> <cts:element xmlns:_1="http://marklogic.com/solutions/obi/source">_1:title</cts:element> <cts:text xml:lang="en">Non-Government</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>punctuation-insensitive</cts:option> <cts:option>stemmed</cts:option> </cts:element-word-query> <cts:element-word-query weight="10"> <cts:element xmlns:_1="http://marklogic.com/solutions/obi/source">_1:title</cts:element> <cts:text xml:lang="en">Bond</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>punctuation-insensitive</cts:option> <cts:option>stemmed</cts:option> </cts:element-word-query> </cts:near-query>
Implementing Categorization
SLIDE: 42 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization
Reverse Queries
SLIDE: 43 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization
Reverse Queries
Ingest mydoc.xml 1 MB document Save as
/tmp/mydoc.xml in collection tmp
SLIDE: 44 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization
Reverse Queries
Ingest mydoc.xml 1 MB document Save as
/tmp/mydoc.xml in collection tmp
In-memory copy of abridged document (10% = 100K)
SLIDE: 45 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization
Reverse Queries
Ingest mydoc.xml 1 MB document Save as
/tmp/mydoc.xml in collection tmp
In-memory copy of abridged document (10% = 100K)
SLIDE: 46 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization
Reverse Queries
Ingest mydoc.xml 1 MB document Save as
/tmp/mydoc.xml in collection tmp
In-memory copy of abridged document (10% = 100K)
cts:search(fn:doc(), cts:and-query(( cts:collection-query("topicQueries"), cts:reverse-query($reducedDoc) )), "unfiltered" )
SLIDE: 47 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization ‒ Scoring
Reverse Queries
/tmp/mydoc.xml
SLIDE: 48 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reverse Query Categorization ‒ Scoring
Reverse Queries
/tmp/mydoc.xml
for $q in $reverseSearchResults let $score := cts:fitness( cts:search( fn:doc(), cts:and-query(( cts:document-query($myURI), cts:query($q/query) )), ("score-logtf","unfiltered") )
SLIDE: 49
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Envelope Pattern ‒ Know Your Data Homogenized Variant Data Types
Remain Schema-Agnostic, Load As-Is
Annotate Your Data
SLIDE: 50 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Building the Envelope
Ingested Content
Metadata
Category Annotation
SLIDE: 51
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Envelope ‒ Know Your Data Standardize ingested documents of multiple types Common names may be different (e.g. Title, Heading, Subject)
Normalize your searchable elements into a common format
Multiple versions of the same document (e.g. Translations)
Preserve artifacts
Indexed search on metadata (facets)
Full-text search on content
SLIDE: 52
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in an Awesome Search App Data
Categorization, Envelope Structure
Query Query as a Dialogue
Context
Building an Information Resource
Results
Relevant Response
SLIDE: 53
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
What is the Query About? Establish Aboutness Through Dialogue
Type-ahead
Match Lexicons
Match Search History (user, global)
Background Search
Phrase Identification
Object Matching
Boost Queries
SLIDE: 54
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
What is the Query About? Establishing Aboutness Through Dialogue (Cont’d)
Correction and Confirmation
Spell Check
Thesaurus Expansion
Semantic Expansion
Drill Down
Follow Up Search Box
Facets
Constraints on the Fly
declare function get( $context as map:map, $params as map:map ) as document-node()? { let $suggestList := ("person", "country", "organization", "project", "region", "topic") let $partial := xdmp:url-decode(map:get($params, "partial-q")) let $suggestions := map:new(( for $s in $suggestList let $term := fn:concat($partial,"*") return map:entry($s,search:suggest( $term, $options, if ($s eq "person") then 20 else 3 ))) return document {xdmp:to-json($suggestions)} };
Extending Suggest REST API extension Search multiple lexicons for
each request Type-ahead must be fast Resist temptation to put
multiple suggest calls in the app
SLIDE: 56
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Query Expansion ‒ Background Search Finding phrases when there are no quotes
Approach 1 SPARQL - Match phrases in metadata, taxonomies
Put the matched text into a weighted custom constraint (e.g. phrase: )
OR the phrase: constraint with the full query
Approach 2 Build NEAR queries using pairs of nearby terms
OR the phrase and boost the weight on the NEAR query if matched
SLIDE: 57
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Example – Constraints on the Fly Selected documents share common elements
Elements not known in advance but can be used as in-app facets
Set as constraints in the submitted options
SLIDE: 58
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Example – Selecting from Feature Fields
SLIDE: 59
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Example – Selecting from Feature Fields
SLIDE: 60
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in an Awesome Search App Data
Categorization, Envelope Structure
Query
Query as a Dialogue
Context Building an Information Resource
Results
Relevant Response
SLIDE: 61
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Aboutness – The Context Behind The Query Document
LexisNexis WestLaw Medline USPTO
SLIDE: 62
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Aboutness – The Context Behind The Query Document Facts
LexisNexis WestLaw Medline USPTO
WikiPedia CIA World Factbook
SLIDE: 63
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Aboutness – The Context Behind The Query Document Facts Information
LexisNexis WestLaw Medline USPTO
WikiPedia CIA World Factbook
Google Bing
SLIDE: 64 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reference Data – Link or Load External Data Real Time Read
SLIDE: 65 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reference Data – Link or Load External Data Ingest Retrieval Embedded Data
SLIDE: 66 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Reference Data – Link or Load External Data Copied Reference Data Embedded Links
additional-query SPARQL
SLIDE: 67
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Implementing Aboutness in an Awesome Search App Data
Categorization, Envelope Structure
Query
Query as a Dialogue
Context
Building an Information Resource
Results Relevant Response
SLIDE: 68
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Planning Your Data For Results When to index metadata
Facets
Sortable
Lexicon dropdowns
Document joins
When not to index
Wildcarding
Search:search constraints
SLIDE: 69
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Results ‒ Focus On Relevance Boost the Document: Quality
Boost the Element: Term Weighting
Boost the Term: Query Weighting
xdmp:document-insert( "/example.xml", <a>aaa</a>, xdmp:default-permissions(), xdmp:default-collections(), 10 ) xdmp:document-load( "http://myCompany.com/file.xml", <options xmlns="xdmp:document-load"> <uri>/documents/myFile.xml</uri> <repair>none</repair> <permissions>{xdmp:default-permissions()}</permissions> < <format>xml</format> <quality>10</quality> </options>)
Boost the Document Set quality xdmp:document-insert xdmp:document-load
SLIDE: 71
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
In the database configuration
Select Word Query from the menu
Under the includes tab add your ranked element with its weight
Boost the Element
SLIDE: 72
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Boost the Term With no Query Weighting the MarkLogic Companies document appears first
SLIDE: 73
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The Boost-Query A conditional secondary query that increases the weight of results matching both
criteria
The boost-query pushes the MarkLogic World document to the top
SLIDE: 74
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Aboutness – The Key to Successful Search What is this document about?
What is this query about?
What is the user’s activity about?
What are the responses about?
SLIDE: 75
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Rendering Awesome Apps
SLIDE: 76
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Q&A