understanding and visualizing solr explain information - rafal kuc

23
Understanding and visualising Solr explain information Rafał Kuć, Marek Rogoziński, Solr.pl [email protected], [email protected], 18.10.2011

Upload: lucenerevolution

Post on 27-Jan-2015

111 views

Category:

Technology


2 download

DESCRIPTION

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 Talk and presentation about how to use, understand and visualize Solr 'explain' information—essential output from Solr that lets you better tune and debug your search application. In the talk, I'll show the free software that is in development right now, that visualize Solr 'explain' information, such as how the score of the documents were counted, from what it is taken, how it was counted,which tokens mattered the most, and so on.

TRANSCRIPT

Page 1: Understanding and visualizing solr explain information - Rafal Kuc

Understanding and visualisingSolr explain information

Rafał Kuć, Marek Rogoziński, [email protected], [email protected], 18.10.2011

Page 2: Understanding and visualizing solr explain information - Rafal Kuc

My Background

� Rafał Kuć• Working with Lucene since 2002• Working with Solr since 2007

� Solr.pl• Co – founder (with Marek Rogozi ńńńński)

� Area of expertise• Lucene and Solr consultant and architect in

many major e-commerce sites in Poland• Author of „Solr 3.1 cookbook” by Packt

Publishing• Father, husband, Starcraft II player and a

gardener after hours ☺

3

Page 3: Understanding and visualizing solr explain information - Rafal Kuc

What I Will Cover

� Understanding and visualising Solr explaininformation

� How to make the information given by Apache Solr explain easily readable by a Solr user (not much technical one)

� Context• Complicated explain made simple• Explain other made even simpler

� What’s next to come

4

Page 4: Understanding and visualizing solr explain information - Rafal Kuc

A typical use case

Page 5: Understanding and visualizing solr explain information - Rafal Kuc

The Challenge

� Common questions like:• Why this document was found ?• Why this document wasn’t found ?• Why this document is higher than the other one ?• Why the results list look like this ?

� Considerations• Do we always have to anwser those questions ?

� So how to make users get the answers they want ?• That’s how http://explain.solr.pl was born

6

Page 6: Understanding and visualizing solr explain information - Rafal Kuc

Let’s look at a typical example

� You run a query• q=ddr&defType=dismax&qf=name^1000+description^100&bf

=pow(price,1.5)&debugQuery=true&indent=true

� And you see the explain information

7

1.6771803 = (MATCH) sum of: 0.64883727 = (MATCH) max of:

0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of:0.99999994 = queryWeight(name:ddr^1000.0), product of:

1000.0 = boost2.446919 = idf(docFreq=3, maxDocs=17) 4.0867718E-4 = queryNorm

0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of: 1.4142135 = tf(termFreq(name:ddr)=2) 2.446919 = idf(docFreq=3, maxDocs=17) 0.1875 = fieldNorm(field=name, doc=6)

1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of: 2516.272 = pow(float(price)=185.0,const(1.5)) 1.0 = boost4.0867718E-4 = queryNorm

Page 7: Understanding and visualizing solr explain information - Rafal Kuc

Some theory

� tf – term’s frequency

� df – document frequency� idf – inverse document frequency

� norm – normalization factor• queryNorm – query normalization factor• fieldNorm – field normalization factor

� coord – score factor

8

Page 8: Understanding and visualizing solr explain information - Rafal Kuc

Let’s take a look at it again1.6771803 = (MATCH) sum of:

0.64883727 = (MATCH) max of:

0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of:

0.99999994 = queryWeight(name:ddr^1000.0), product of:

1000.0 = boost

2.446919 = idf(docFreq=3, maxDocs=17)

4.0867718E-4 = queryNorm

0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of:

1.4142135 = tf(termFreq(name:ddr)=2)

2.446919 = idf(docFreq=3, maxDocs=17)

0.1875 = fieldNorm(field=name, doc=6)

1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of:

2516.272 = pow(float(price)=185.0,const(1.5))

1.0 = boost

4.0867718E-4 = queryNorm

Page 9: Understanding and visualizing solr explain information - Rafal Kuc

A little more complicated example36.50278 = (MATCH) sum of:

1.54896 = (MATCH) sum of:0.46676102 = (MATCH) max of:0.46676102 = (MATCH) weight(name:hard^20.0 in 2), product of:

0.5461986 = queryWeight(name:hard^20.0), product of:20.0 = boost2.734601 = idf(docFreq=2, maxDocs=17)0.009986806 = queryNorm

0.8545628 = (MATCH) fieldWeight(name:hard in 2), product of:1.0 = tf(termFreq(name:hard)=1)2.734601 = idf(docFreq=2, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)

0.46676102 = (MATCH) max of:0.46676102 = (MATCH) weight(name:drive^20.0 in 2), product of:

0.5461986 = queryWeight(name:drive^20.0), product of:20.0 = boost2.734601 = idf(docFreq=2, maxDocs=17)0.009986806 = queryNorm

0.8545628 = (MATCH) fieldWeight(name:drive in 2), product of:1.0 = tf(termFreq(name:drive)=1)2.734601 = idf(docFreq=2, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)

0.61543787 = (MATCH) max of:

0.098470055 = (MATCH) weight(manu:maxtor in 2), product of:0.03135923 = queryWeight(manu:maxtor), product of:3.1400661 = idf(docFreq=1, maxDocs=17)0.009986806 = queryNorm

3.1400661 = (MATCH) fieldWeight(manu:maxtor in 2), product of:1.0 = tf(termFreq(manu:maxtor)=1)3.1400661 = idf(docFreq=1, maxDocs=17)1.0 = fieldNorm(field=manu, doc=2)

0.61543787 = (MATCH) weight(name:maxtor^20.0 in 2), product of:0.6271846 = queryWeight(name:maxtor^20.0), product of:20.0 = boost3.1400661 = idf(docFreq=1, maxDocs=17)0.009986806 = queryNorm

0.9812707 = (MATCH) fieldWeight(name:maxtor in 2), product of:1.0 = tf(termFreq(name:maxtor)=1)3.1400661 = idf(docFreq=1, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)

34.95382 = (MATCH) FunctionQuery(float(price)), product of:350.0 = float(price)=350.010.0 = boost0.009986806 = queryNorm

Page 10: Understanding and visualizing solr explain information - Rafal Kuc

And now , a real life example1.6287426 = (MATCH) sum of:

0.8143703 = (MATCH) sum of:0.40718514 = (MATCH) max plus 0.01 times others of:4.154771E-7 = (MATCH) weight(description_nostemm:harry^10.0 in 36647), product of:4.4066886E-7 = queryWeight(description_nostemm:harry^10.0), product of:10.0 = boost7.5426636 = idf(docFreq=796, maxDocs=553224)5.8423506E-9 = queryNorm

0.94283295 = (MATCH) fieldWeight(description_nostemm:harry in 36647), product of:1.0 = tf(termFreq(description_nostemm:harry)=1)7.5426636 = idf(docFreq=796, maxDocs=553224)0.125 = fieldNorm(field=description_nostemm, doc=36647)

0.40718514 = (MATCH) weight(category_search:harri^2000000.0 in 36647), product of:0.123389944 = queryWeight(category_search:harri^2000000.0), product of:2000000.0 = boost10.559957 = idf(docFreq=38, maxDocs=553224)5.8423506E-9 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_search:harri in 36647), product of:1.0 = tf(termFreq(category_search:harri)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_search, doc=36647)

5.976383E-8 = (MATCH) weight(description:harri in 36647), product of:4.2931266E-8 = queryWeight(description:harri), product of:7.348286 = idf(docFreq=967, maxDocs=553224)5.8423506E-9 = queryNorm

1.3920817 = (MATCH) fieldWeight(description:harri in 36647), product of:1.7320508 = tf(termFreq(description:harri)=3)7.348286 = idf(docFreq=967, maxDocs=553224)0.109375 = fieldNorm(field=description, doc=36647)

0.40718514 = (MATCH) max plus 0.01 times others of:5.0300997E-7 = (MATCH) weight(description_nostemm:potter^10.0 in 36647), product of:4.84872E-7 = queryWeight(description_nostemm:potter^10.0), product of:10.0 = boost8.299262 = idf(docFreq=373, maxDocs=553224)5.8423506E-9 = queryNorm

1.0374078 = (MATCH) fieldWeight(description_nostemm:potter in 36647), product of:1.0 = tf(termFreq(description_nostemm:potter)=1)8.299262 = idf(docFreq=373, maxDocs=553224)0.125 = fieldNorm(field=description_nostemm, doc=36647)

0.40718514 = (MATCH) weight(category_search:Potter^2000000.0 in 36647), product of:0.123389944 = queryWeight(category_search:Potter^2000000.0), product of:2000000.0 = boost10.559957 = idf(docFreq=38, maxDocs=553224)5.8423506E-9 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_search:Potter in 36647), product of:1.0 = tf(termFreq(category_search:Potter)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_search, doc=36647)

5.7398886E-8 = (MATCH) weight(description:Potter in 36647), product of:4.656172E-8 = queryWeight(description:Potter), product of:7.9696894 = idf(docFreq=519, maxDocs=553224)5.8423506E-9 = queryNorm

1.2327484 = (MATCH) fieldWeight(description:Potter in 36647), product of:1.4142135 = tf(termFreq(description:Potter)=2)7.9696894 = idf(docFreq=519, maxDocs=553224)0.109375 = fieldNorm(field=description, doc=36647)

1.8327936E-6 = (MATCH) max plus 0.01 times others of:1.8327936E-6 = (MATCH) weight(description_nostemm:"harry potter"~100^10.0 in 36647), product of:9.255408E-7 = queryWeight(description_nostemm:"harry potter"~100^10.0), product of:10.0 = boost15.841926 = idf(description_nostemm: harry=796 potter=373)5.8423506E-9 = queryNorm

1.9802407 = fieldWeight(description_nostemm:"harry potter" in 36647), product of:1.0 = tf(phraseFreq=1.0)15.841926 = idf(description_nostemm: harry=796 potter=373)0.125 = fieldNorm(field=description_nostemm, doc=36647)

0.81437016 = (MATCH) sum of:0.40718508 = (MATCH) weight(category_the:harri in 36647), product of:0.12338993 = queryWeight(category_the:harri), product of:10.559957 = idf(docFreq=38, maxDocs=553224)0.011684701 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_the:harri in 36647), product of:1.0 = tf(termFreq(category_the:harri)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_the, doc=36647)

0.40718508 = (MATCH) weight(category_the:Potter in 36647), product of:0.12338993 = queryWeight(category_the:Potter), product of:10.559957 = idf(docFreq=38, maxDocs=553224)0.011684701 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_the:Potter in 36647), product of:1.0 = tf(termFreq(category_the:Potter)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_the, doc=36647)

3.394099E-7 = (MATCH) FunctionQuery(pow(int(sold),const(1.5))), product of:58.09475 = pow(int(sold)=15,const(1.5))1.0 = boost5.8423506E-9 = queryNorm

Page 11: Understanding and visualizing solr explain information - Rafal Kuc

Let’s visualize now

Page 12: Understanding and visualizing solr explain information - Rafal Kuc

History view

Page 13: Understanding and visualizing solr explain information - Rafal Kuc

Basic information

Page 14: Understanding and visualizing solr explain information - Rafal Kuc

The real thing

Page 15: Understanding and visualizing solr explain information - Rafal Kuc

Even more ☺

Page 16: Understanding and visualizing solr explain information - Rafal Kuc

What if we can ’t match ?

Page 17: Understanding and visualizing solr explain information - Rafal Kuc

And the no-matched explain

Page 18: Understanding and visualizing solr explain information - Rafal Kuc

What you gain from explain.solr.pl

� View Solr explain information in a humanreadable form

� Easily recognize the most influencing elementsof the scoring process

� Answer the questions faster� More things to come in the future

19

Page 19: Understanding and visualizing solr explain information - Rafal Kuc

Plans for the future

� Support for more formats of Apache Solrexplain (right now, only Solr 3.x is supported)

� Visualisation of additional data� More functionalities like:

• query problems analysis• query syntax analysis and explanation• query time analysis and visualization• result comparison between cores or instances

� Very distant future - additional web applicationdeployed along Solr to enable real timeanalysis of boosts influence

Page 20: Understanding and visualizing solr explain information - Rafal Kuc

Wrap Up

� The http://explain.solr.pl should be availablevery soon (probably end of October or midNovember)

� Code of explain.solr.pl will be available on GitHub soon after the initial release

� There will be a Java version of thehttp://explain.solr.pl which will cover much moreinformation

21

Page 21: Understanding and visualizing solr explain information - Rafal Kuc

Sources

� Links• http://www.solr.pl• http://explain.solr.pl• http://lucene.apache.org ☺

� We would like to thank:• ŁŁŁŁukasz Lewandowski ( http://llewandowski.pl/ ) for

his work on the GUI • Hubert ‘depesz’ Lubaczewski ( http://depesz.com )

for idea ☺

22

Page 22: Understanding and visualizing solr explain information - Rafal Kuc

Contact

� Rafał Kuć• [email protected]• http://solr.pl

� Marek Rogoziński• [email protected]• http://solr.pl

23

Page 23: Understanding and visualizing solr explain information - Rafal Kuc

Thank you