how to build an awesome search app - marklogic · data categorization, envelope structure query...

Post on 14-Apr-2018

220 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

HOW TO BUILD AN AWESOME SEARCH APP Stu McLean, Ph.D., Principal Consultant, Public Sector - Civilian, MarkLogic Ganesh Vaideeswaran, Senior Director, Development, MarkLogic

SLIDE: 2

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Agenda Components of a search application

Deep dive

SLIDE: 3

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 5

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 6

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 7

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 8

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 9

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 10

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 11

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 12

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 13

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 14

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 15

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 16

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SLIDE: 17

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Awesome Search for the Enterprise

WHAT MARKLOGIC PROVIDES

360° VIEW

SLIDE: 19

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

NBCUniversal – SNL40 App

Smart Content

Enriched and targeted content

Constantly tuned recommendations

> 5,550 sketches

SLIDE: 20

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Talent Kristen Wiig

Acted in

Episode 4 Anne Hathaway and

Killers

Part of Played

Character Maharelle Sister

Season 34

Segment The Lawrence Welk Show

Aired on

Date 10/4/08

Era

Acted in

Includes

Part of Has

Characteristic Tiny hands

NBCUniversal – SNL40 App

Smart Content: Enriched and targeted content Constantly tuned recommendations

SLIDE: 21

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Fairfax County – Mapping Crime Data

Data Variety

500,000 GIS data points

Service Calls

Source Variety

800 GIS layers

Structured & Unstructured

SLIDE: 23

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Mitchell1 – Streamlined Auto Repair Information

MITCHELL1 PRODEMAND

REPAIR INFO

EXPERT ADVICE

REPAIR ORDERS

Structured & Unstructured Data

Sophisticated Search

Multi-device Delivery

SLIDE: 25

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Needs for an Awesome Search App

Relevance

Snippets

Highlight

Auto Complete

Spell Suggest

Synonyms

QUERY

Structured & Unstructured

Multi-Model approach

DATA

Alerts

Transactions

Security

Real-time

Languages

Configurability

Development Platform

PLATFORM

Context

Presentation

SLIDE: 26

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Aboutness – The Key to Successful Search What is this document about?

What is this submitted query about?

What is the user’s activity about?

What are the responses about?

SLIDE: 27

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data What is this document about?

SLIDE: 28

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

SLIDE: 29

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

Query What is this query about?

SLIDE: 30

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

Query

SLIDE: 31

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

Query

Context What is the user’s activity about?

SLIDE: 32

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

Query

Context

SLIDE: 33

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

Query

Context

Results What are the responses about?

SLIDE: 34

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in MarkLogic

Data

Query

Context

Results

SLIDE: 35

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in an Awesome Search App Data

Categorization, Envelope Structure

Query

Query as a Dialogue

Context

Building an Information Resource

Results

Relevant Response

SLIDE: 36

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in an Awesome Search App Data Categorization, Envelope Structure

Query

Query as a Dialogue

Context

Building an Information Resource

Results

Relevant Response

SLIDE: 37

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Domain Knowledge ‒ Tools for Establishing Aboutness Taxonomies – vocabulary of the organization

Ontologies – Tie your documents to relevant data and documents

Thesaurus

Organizational units, people, products

Public vocabularies – Geographic, DBPedia

SLIDE: 38

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Terms to look for in your documents Taxonomies

SLIDE: 39

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Taxonomies and Reverse Queries Categorization ‒ Finding Aboutness

Public taxonomies

Getty – art and architecture

World Bank – humanitarian projects

Library of Congress – subject headings

National Library of Medicine – medical headings

Domain taxonomies

Best Buy

Google

SLIDE: 40

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Where does Aboutness reside?

title:

abstract:

glossary:

Body

Expert searchers compose carefully crafted searches with domain knowledge

The rest of us hope for the best

Categorizing Your Documents

SLIDE: 41

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Turn a Taxonomy Entry into a Query <cts:or-query xmlns:cts="http://marklogic.com/cts"> <cts:near-query distance="5"> <cts:near-query distance="5"> <cts:element-word-query weight="10"> <cts:element xmlns:_1="http://marklogic.com/solutions/obi/source">_1:title</cts:element> <cts:text xml:lang="en">Non-Government</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>punctuation-insensitive</cts:option> <cts:option>stemmed</cts:option> </cts:element-word-query> <cts:element-word-query weight="10"> <cts:element xmlns:_1="http://marklogic.com/solutions/obi/source">_1:title</cts:element> <cts:text xml:lang="en">Bond</cts:text> <cts:option>case-insensitive</cts:option> <cts:option>punctuation-insensitive</cts:option> <cts:option>stemmed</cts:option> </cts:element-word-query> </cts:near-query>

Implementing Categorization

SLIDE: 42 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization

Reverse Queries

SLIDE: 43 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization

Reverse Queries

Ingest mydoc.xml 1 MB document Save as

/tmp/mydoc.xml in collection tmp

SLIDE: 44 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization

Reverse Queries

Ingest mydoc.xml 1 MB document Save as

/tmp/mydoc.xml in collection tmp

In-memory copy of abridged document (10% = 100K)

SLIDE: 45 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization

Reverse Queries

Ingest mydoc.xml 1 MB document Save as

/tmp/mydoc.xml in collection tmp

In-memory copy of abridged document (10% = 100K)

SLIDE: 46 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization

Reverse Queries

Ingest mydoc.xml 1 MB document Save as

/tmp/mydoc.xml in collection tmp

In-memory copy of abridged document (10% = 100K)

cts:search(fn:doc(), cts:and-query(( cts:collection-query("topicQueries"), cts:reverse-query($reducedDoc) )), "unfiltered" )

SLIDE: 47 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization ‒ Scoring

Reverse Queries

/tmp/mydoc.xml

SLIDE: 48 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reverse Query Categorization ‒ Scoring

Reverse Queries

/tmp/mydoc.xml

for $q in $reverseSearchResults let $score := cts:fitness( cts:search( fn:doc(), cts:and-query(( cts:document-query($myURI), cts:query($q/query) )), ("score-logtf","unfiltered") )

SLIDE: 49

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Envelope Pattern ‒ Know Your Data Homogenized Variant Data Types

Remain Schema-Agnostic, Load As-Is

Annotate Your Data

SLIDE: 50 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Building the Envelope

Ingested Content

Metadata

Category Annotation

SLIDE: 51

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Envelope ‒ Know Your Data Standardize ingested documents of multiple types Common names may be different (e.g. Title, Heading, Subject)

Normalize your searchable elements into a common format

Multiple versions of the same document (e.g. Translations)

Preserve artifacts

Indexed search on metadata (facets)

Full-text search on content

SLIDE: 52

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in an Awesome Search App Data

Categorization, Envelope Structure

Query Query as a Dialogue

Context

Building an Information Resource

Results

Relevant Response

SLIDE: 53

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

What is the Query About? Establish Aboutness Through Dialogue

Type-ahead

Match Lexicons

Match Search History (user, global)

Background Search

Phrase Identification

Object Matching

Boost Queries

SLIDE: 54

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

What is the Query About? Establishing Aboutness Through Dialogue (Cont’d)

Correction and Confirmation

Spell Check

Thesaurus Expansion

Semantic Expansion

Drill Down

Follow Up Search Box

Facets

Constraints on the Fly

declare function get( $context as map:map, $params as map:map ) as document-node()? { let $suggestList := ("person", "country", "organization", "project", "region", "topic") let $partial := xdmp:url-decode(map:get($params, "partial-q")) let $suggestions := map:new(( for $s in $suggestList let $term := fn:concat($partial,"*") return map:entry($s,search:suggest( $term, $options, if ($s eq "person") then 20 else 3 ))) return document {xdmp:to-json($suggestions)} };

Extending Suggest REST API extension Search multiple lexicons for

each request Type-ahead must be fast Resist temptation to put

multiple suggest calls in the app

SLIDE: 56

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Query Expansion ‒ Background Search Finding phrases when there are no quotes

Approach 1 SPARQL - Match phrases in metadata, taxonomies

Put the matched text into a weighted custom constraint (e.g. phrase: )

OR the phrase: constraint with the full query

Approach 2 Build NEAR queries using pairs of nearby terms

OR the phrase and boost the weight on the NEAR query if matched

SLIDE: 57

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Example – Constraints on the Fly Selected documents share common elements

Elements not known in advance but can be used as in-app facets

Set as constraints in the submitted options

SLIDE: 58

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Example – Selecting from Feature Fields

SLIDE: 59

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Example – Selecting from Feature Fields

SLIDE: 60

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in an Awesome Search App Data

Categorization, Envelope Structure

Query

Query as a Dialogue

Context Building an Information Resource

Results

Relevant Response

SLIDE: 61

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Aboutness – The Context Behind The Query Document

LexisNexis WestLaw Medline USPTO

SLIDE: 62

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Aboutness – The Context Behind The Query Document Facts

LexisNexis WestLaw Medline USPTO

WikiPedia CIA World Factbook

SLIDE: 63

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Aboutness – The Context Behind The Query Document Facts Information

LexisNexis WestLaw Medline USPTO

WikiPedia CIA World Factbook

Google Bing

SLIDE: 64 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reference Data – Link or Load External Data Real Time Read

SLIDE: 65 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reference Data – Link or Load External Data Ingest Retrieval Embedded Data

SLIDE: 66 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Reference Data – Link or Load External Data Copied Reference Data Embedded Links

additional-query SPARQL

SLIDE: 67

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Implementing Aboutness in an Awesome Search App Data

Categorization, Envelope Structure

Query

Query as a Dialogue

Context

Building an Information Resource

Results Relevant Response

SLIDE: 68

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Planning Your Data For Results When to index metadata

Facets

Sortable

Lexicon dropdowns

Document joins

When not to index

Wildcarding

Search:search constraints

SLIDE: 69

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Results ‒ Focus On Relevance Boost the Document: Quality

Boost the Element: Term Weighting

Boost the Term: Query Weighting

xdmp:document-insert( "/example.xml", <a>aaa</a>, xdmp:default-permissions(), xdmp:default-collections(), 10 ) xdmp:document-load( "http://myCompany.com/file.xml", <options xmlns="xdmp:document-load"> <uri>/documents/myFile.xml</uri> <repair>none</repair> <permissions>{xdmp:default-permissions()}</permissions> < <format>xml</format> <quality>10</quality> </options>)

Boost the Document Set quality xdmp:document-insert xdmp:document-load

SLIDE: 71

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

In the database configuration

Select Word Query from the menu

Under the includes tab add your ranked element with its weight

Boost the Element

SLIDE: 72

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Boost the Term With no Query Weighting the MarkLogic Companies document appears first

SLIDE: 73

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

The Boost-Query A conditional secondary query that increases the weight of results matching both

criteria

The boost-query pushes the MarkLogic World document to the top

SLIDE: 74

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Aboutness – The Key to Successful Search What is this document about?

What is this query about?

What is the user’s activity about?

What are the responses about?

SLIDE: 75

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Rendering Awesome Apps

SLIDE: 76

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Q&A

top related