advanced query parsing techniques

24
Advanced Relevancy Ranking Paul Nelson Chief Architect / Search Technologies

Upload: lucenerevolution

Post on 09-Jul-2015

1.178 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Advanced query parsing techniques

Advanced Relevancy Ranking

Paul NelsonChief Architect / Search Technologies

Page 2: Advanced query parsing techniques

2Search Technologies Overview

• Formed June 2005• Over 100 employees and growing• Over 400 customers worldwide• Presence in US, Latin America, UK & Germany• Deep enterprise search expertise• Consistent revenue growth and profitability• Search Engine Independent

Page 3: Advanced query parsing techniques

3Lucene Relevancy: Simple Operators

• term(A) TF(A) * IDF(A)• Implemented with DefaultSimilarity / TermQuery• TF(A) = sqrt(termInDocCount)• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0

• and(A,B) A * B• Implemented with BooleanQuery()

• or(A, B) A + B• Implemented with BooleanQuery()

• max(A, B) max(A, B)• Implemented with DisjunctionMaxQuery()

3

Page 4: Advanced query parsing techniques

4Simple Operators - Example

and

or max

george martha washington custis

0.10 0.20 0.60 0.90

0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90

0.3 * 0.9 = 0.27

Page 5: Advanced query parsing techniques

5Less Used Operators

• boost(f, A) (A * f)• Implemented with Query.setBoost(f)

• constant(f, A) if(A) then f else 0.0• Implemented with ConstantScoreQuery()

• boostPlus(A, B) if(A) then (A + B) else 0.0• Implemented with BooleanQuery()

• boostMul(f, A, B) if(B) then (A * f) else A• Implemented with BoostingQuery()

5

Page 6: Advanced query parsing techniques

6Problem: Need for More Flexibility

• Difficult / impossible to use all operators• Many not available in standard query parsers

• Complex expressions = string manipulation• This is messy

• Query construction is in the application layer• Your UI programmer is creating query expressions?• Seriously?

• Hard to create and use new operators• Requires modifying query parsers - yuck

6

Page 7: Advanced query parsing techniques

7

Solr

Query Processing Language 7

UserInterface

QPLEngine Search

QPLScript

Page 8: Advanced query parsing techniques

8Introducing: QPL

• Query Processing Language• Domain Specific Language for Constructing Queries• Built on Groovy• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page

• Solr Plug-Ins• Query Parser• Search Component

• “The 4GL for Text Search Query Expressions”• Server-side Solr Access

• Cores, Analyzers, Embedded Search, Results XML

8

Page 9: Advanced query parsing techniques

9Solr Plug-Ins

Page 10: Advanced query parsing techniques

10QPL Configuration – solrconfig.xml

<queryParser name="qpl"class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">

<str name="scriptFile">parser.qpl</str><str name="defaultField">text</str>

</queryParser>

<searchComponent name="qplSearchFirst"class="com.searchtechnologies.qpl.solr.QPLSearchComponent">

<str name="scriptFile">search.qpl</str><str name="defaultField">text</str><str name="isProcessScript">false</str>

</searchComponent>

Query Parser Configuration:

Search Component Configuration:

Page 11: Advanced query parsing techniques

11QPL Example #1

myTerms = solr.tokenize(query);

phraseQ = phrase(myTerms);

andQ = and(myTerms);

return phraseQ^3.0 | andQ^2.0 | orQ;

Tokenize:

Phrase Query:

And Query:

Put It All Together:

orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms);

Or Query:

Page 12: Advanced query parsing techniques

12Thesaurus Example #2

myTerms = solr.tokenize(query);

thes = Thesaurus.load("thesaurus.xml")

thesQ = thes.expand(0.8f,solr.tokenizer("text"), myTerms);

return and(thesQ);

Tokenize:

Load Thesaurus: (cached)

Thesaurus Expansion:

Put It All Together:Original Query: bathroom humor

[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]

Page 13: Advanced query parsing techniques

13More Operators

Boolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")

Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)

Composite Queries:compQ = and(compositeMax(

["title":1.5, "body":0.8],"george", "washington"))

Page 14: Advanced query parsing techniques

14News Feed Use Case 14

Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older

Page 15: Advanced query parsing techniques

15News Feed Use Case – Step 1

markets = split(solr.markets, "\\s*;\\s*")marketsQ = field("markets", or(markets));

terms = solr.tokenize(query);termsQ = field("body",

or(thesaurus.expand(0.9f, terms)))

compIds = split(solr.compIds, "\\s*;\\s*")compIdsQ = field("companyIds", or(compIds))

Segments:

Terms:

Companies:

Page 16: Advanced query parsing techniques

16News Feed Use Case – Step 2

todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)

c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)

Today:

Yesterday:

sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()

Page 17: Advanced query parsing techniques

17News Feed Use Case 17

Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older

Page 18: Advanced query parsing techniques

18News Feed Use Case – Step 3

sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)

tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)

recentQ = and(subjectQ, timeQ)

Weighted Subject Queries:

Weighted Time Queries:

Put it All Together:

return max(recentQ, or(marketsQ,compIdsQ)^0.01))

Page 19: Advanced query parsing techniques

19Embedded Search Example #1

results = solr.search('subjectsCore', or(qTerms), 50)

subjectsQ = or(results*.subjectId)

return field("title", and(qTerms)) | subjectsQ^0.9;

Execute an Embedded Search:

Create a query from the results:

Put it all together:

qTerms = solr.tokenize(qTerms);

Page 20: Advanced query parsing techniques

20Embedded Search Example #2

results = solr.search('categories', and(qTerms), 10)

myList = solr.newList();myList.add("relatedCategories", results*.title);

solr.addResponse(myList)

Execute an Embedded Search:

Create a Solr named list:

Add it to the XML response:

qTerms = solr.tokenize(qTerms);

Page 21: Advanced query parsing techniques

21Other Features

• Embedded Grouping Queries• Oh yes they did!

• Proximity operators• ADJ, NEAR/#, BEFORE/#

• Reverse Lemmatizer• Prefers exact matches over variants

• Transformer• Applies transformations recursively to query trees

21

Page 22: Advanced query parsing techniques

22

Solr

Query Processing Language 22

UserInterface

QPLEngine Search

Data as entered by user Boolean

Query ExpressionQPL

Script

ApplicationDev Team

Search Team

Page 23: Advanced query parsing techniques

23

Solr

QPL: Using External Sources to Build Queries 23

UserInterface

QPLEngine Search

QPLScript

RDBMS OtherIndexes Thesaurus

Page 24: Advanced query parsing techniques

CONTACT

Paul [email protected]