learning to rank in solr: presented by michael nilsson & diego ceccarelli, bloomberg lp
TRANSCRIPT
![Page 1: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/1.jpg)
Learning To Rank For Solr Michael Nilsson – Software Engineer
Diego Ceccarelli – Software Engineer
Joshua Pantony – Software Engineer Bloomberg LP
Copyright 2015 Bloomberg L.P. All rights reserved.
![Page 2: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/2.jpg)
OUTLINE ● Search at Bloomberg
● Why do we need machine learning for search?
● Learning to Rank
● Solr Learning to Rank Plugin
![Page 3: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/3.jpg)
8 millions searches PER DAY
1 million PER DAY
400 million stories in the index
![Page 4: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/4.jpg)
SOLR IN BLOOMBERG ● Search engine of choice at Bloomberg
─ Large community / Well distributed committers
─ Open source Apache Project
─ Used within many commercial products
─ Large feature set and rapid growth
● Committed to open-source ─ Ability to contribute to core engine
─ Ability to fix bugs ourselves
─ Contributions in almost every Solr release since 4.5.0
![Page 5: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/5.jpg)
PROBLEM SETUP
score: 30
score: 1.0
![Page 6: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/6.jpg)
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=100∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+�10∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
score: 52.2
score: 30.8
![Page 7: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/7.jpg)
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=100∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+�10∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
![Page 8: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/8.jpg)
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟏𝟓𝟎∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+�𝟑.𝟏𝟒∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+�𝟒𝟐∗𝑐𝑙𝑖𝑐𝑘𝑠
![Page 9: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/9.jpg)
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+𝟑.𝟏𝟏𝟏𝟒∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+𝟒𝟐.𝟒𝟐∗𝑐𝑙𝑖𝑐𝑘𝑠 + 5 ∗ timeElapsedFrom LastUpdate
![Page 10: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/10.jpg)
● It’s hard to manually tweak the ranking ─ You must be an expert in the domain
─ … or a magician
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+𝟑.𝟏𝟏𝟏𝟒∗𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+𝟒𝟐.𝟒𝟐∗𝑐𝑙𝑖𝑐𝑘𝑠 + 5 ∗ timeElapsedFrom LastUpdate
query = solr query = lucene query = austin query = bloomberg query = …
![Page 11: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/11.jpg)
PROBLEM SETUP
It’s easier with Machine Learning ● 2,000+ parameters (non-linear, factorially larger than linear form)
● 8,000+ queries that are regularly tuned
● Early on we spent many days hand tuning…
![Page 12: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/12.jpg)
SEARCH PIPELINE (ONLINE)
Index
Top-k retrieval
User Query
People
Commodities News
Other Sources
ReRanking Model
Top-k reranked
Top-x retrieval x >> k
![Page 13: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/13.jpg)
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
![Page 14: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/14.jpg)
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
![Page 15: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/15.jpg)
TRAINING DATA: IMPLICIT VS EXPLICIT What is explicit data? ● A set of judges will assess the
search results manually given a query ─ Experts ─ Crowd
What is implicit data? ● Infer user preferences based on
user behavior ─ Aggregated results clicks ─ Query reformulation ─ Dwell time
Pros: ─ Data is very clean
Cons: ─ Can be very expensive!
Pros: ─ A lot of data!
Cons: ─ Extremely noisy
─ Privacy concerns
![Page 16: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/16.jpg)
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
![Page 17: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/17.jpg)
FEATURES ● A feature is an individual measurable property
● Given a query, and a collection we can produce many features for each document in the collection ─ If the query matches the title
─ Length of the document
─ Number of views
─ How old is it?
─ Can be visualized on a mobile device?
![Page 18: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/18.jpg)
FEATURES Extract “features”
Was the result a cofounder? 0
Features are signals that give an indication of a result’s importance
![Page 19: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/19.jpg)
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the document have an exec. position? 1
Query : APPL US
![Page 20: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/20.jpg)
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
![Page 21: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/21.jpg)
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
Popularity (%) 0.9
![Page 22: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/22.jpg)
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 1
Does the document have an exec. position? 0
Popularity (%) 0.6
![Page 23: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/23.jpg)
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
![Page 24: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/24.jpg)
METRICS How do we know if our model is doing better? ● Offline metrics
─ Precision/Recall/F1 score
─ nDCG (Normalized Discount Cumulative Gain)
─ Other metrics (e.g., ERR, MAP, …)
● Online Metrics ─ Click through rates à higher
─ Time to first click à lower
─ Interleaving1
1O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Science, 30(1), 2012.
![Page 25: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/25.jpg)
TRAINING PIPELINE (OFFLINE)
Index
Feature Extraction
Learning Algorithm
Ranking Model
Training Query-Document
Pairs
People
Commodities News
Other Sources
Metrics
![Page 26: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/26.jpg)
LEARNING TO RANK
● Learn how to combine the features for optimizing one or more metrics
● Many learning algorithms ─ RankSVM1
─ LambdaMART2
─ …
1T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002. 2C.J.C. Burges, "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft Research Technical Report MSR-TR-2010-82, 2010.
![Page 27: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/27.jpg)
SEARCH PIPELINE: STANDARD
Index
Top-k retrieval
User Query
Solr People
Commodities News
Other Sources
![Page 28: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/28.jpg)
SEARCH PIPELINE: STANDARD
Index
Top-k retrieval
User Query
Solr
Training Data
Learning Algorithm
Ranking Model Offline
People
Commodities News
Other Sources
![Page 29: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/29.jpg)
SEARCH PIPELINE: STANDARD
Index
Top-k retrieval
User Query
Solr
Ranking Model Online Top-x
reranked
People
Commodities News
Other Sources
![Page 30: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/30.jpg)
SEARCH PIPELINE: SOLR INTEGRATION
Index
Top-k retrieval
User Query
Solr
Ranking Model Online Top-x
reranked
People
Commodities News
Other Sources
![Page 31: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/31.jpg)
SOLR RELEVANCY ● Pros
─ Simple and quick scoring computation
─ Phrase matching
─ Function query boosting on time, distance, popularity, etc
─ Customized fields for stemming, synonyms, etc
● Cons ─ Lots of manual time for creating a well tuned query
─ Weights are brittle, and may not be compatible in the future with more documents or fields added
![Page 32: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/32.jpg)
LTR PLUGIN: GOALS ● Don’t tune the relevancy manually!
─ Uses machine learning to power automatic relevancy tuning
● Significant relevancy improvements
● Allow comparable scores across collections ─ Collections of different sizes
● Maintaining low latency ─ Re-use the vast Solr search functionality that is already built-in
─ Less data transport
● Makes it simple to use domain knowledge to rapidly create features ─ Features are no longer coded but rather scripted
![Page 33: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/33.jpg)
STANDARD SOLR SEARCH REQUEST
Index
Top-k retrieval
User Query
People
Commodities News
Other Sources
![Page 34: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/34.jpg)
Index
STANDARD SOLR SEARCH REQUEST
Index [10 Million]
Top-10 retrieval
User Query
Matches [10k]
Score [10k]
Solr Query
People
Commodities News
Other Sources
![Page 35: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/35.jpg)
LTR SOLR SEARCH REQUEST
Index [10 Million]
Top-1000 retrieval
User Query
Matches [10k]
Score [10k]
Ranking Model
Top-10 reranked
Solr Query
LTR Query
People
Commodities News
Other Sources
![Page 36: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/36.jpg)
<!-- Query parser used to rerank top docs with a provided model --> <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
LTR PLUGIN: RERANKING
● LTRQuery extends Solr’s RankQuery ─ Wraps main query to fetch initial results ─ Returns custom TopDocsCollector for reranked ordered results
● Solr rerank request parameter rq={!ltr model=myModel1 reRankDocs=100 efi.user_query=‘james’ efi.my_var=123} ─ !ltr – name used in the solrconfig.xml for the LTRQParserPlugin ─ model – name of deployed model to use for reranking ─ reRankDocs – total number of documents to rerank ─ efi.* – custom parameters used to pass external feature information for your
features to use
• Query intent
• Personalization
![Page 37: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/37.jpg)
SEARCH PIPELINE (ONLINE)
Index [10 Million]
Top-1000 retrieval
User Query
Matches [10k]
Score [10k]
Ranking Model
Top-10 reranked
Feature Extraction
People
Commodities News
Other Sources
![Page 38: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/38.jpg)
{ "name": "Tim Cook", "primary_position": "ceo", "category ": "person", … }
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
Popularity (%) 0.9
![Page 39: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/39.jpg)
LTR PLUGIN: FEATURES BEFORE
![Page 40: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/40.jpg)
[ { "name": "isPersonAndExecutive", "type": "org.apache.solr.ltr.feature.impl.SolrFeature", "params": { "fq": [ "{!terms f=category}person", "{!terms f=primary_position}ceo, cto, cfo, president" ] } }, … ]
LTR PLUGIN: FEATURES AFTER
![Page 41: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/41.jpg)
LTR PLUGIN: FUNCTION QUERIES [ { "name": "documentRecency", "type": "org.apache.solr.ltr.feature.impl.SolrFeature", "params": { "q": "{!func}recip( ms(NOW,publish_date), 3.16e-‐11, 1, 1)" } }, … ] 1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated 2 years ago, etc.. See http://wiki.apache.org/solr/FunctionQuery#Date_Boosting
![Page 42: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/42.jpg)
LTR PLUGIN: FEATURE STORE ● FeatureStore is a Solr Managed Resource
─ REST API endpoint for performing CRUD operations on Solr objects
─ Stored in maintained in Zookeeper
● Deploy ─ curl -XPUT 'http://yoursolrserver/solr/collection/config/fstore'
--data-binary @./features.json -H 'Content-type:application/json'
● View ─ http://yoursolrserver/solr/collection/config/fstore
![Page 43: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/43.jpg)
LTR PLUGIN: FEATURES ● Simplifies feature engineering through configuration file
● Utilizes rich search functionality built-in to Solr ─ Phrase matching
─ Synonyms, Stemming, etc
● Inherit the Feature class for specialized features
![Page 44: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/44.jpg)
SEARCH PIPELINE (ONLINE)
Index [10 Million]
Top-1000 retrieval
User Query
Matches [10k]
Score [10k]
Ranking Model
Top-10 reranked
Feature Extraction
People
Commodities News
Other Sources
![Page 45: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/45.jpg)
TRAINING PIPELINE (OFFLINE)
Index [10 Million]
Top-1000 retrieval
Training Queries
Matches [10k]
Score [10k]
Feature Extraction
Learning Algorithm
Ranking Model
People
Commodities News
Other Sources
![Page 46: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/46.jpg)
{ "name": "Tim Cook", "primary_position": "ceo", "category ": "person", … }
FEATURES Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a cofounder? 0
Does the query match the document title? 0
Does the document have an exec. position? 1
Popularity (%) 0.9
![Page 47: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/47.jpg)
<!-- Document transformer adding feature vectors with each retrieved document --> <transformer name="fv" class= "org.apache.solr.ltr.ranking.LTRFeatureTransformer" />
LTR PLUGIN: FEATURE EXTRACTION
● Feature extraction uses Solr’s TransformerFactory ─ Returns a custom field with each document
● fl = *, [fv] { "name": "Tim Cook", "primary_position": "ceo", "category ": "person", … "[fv]": "isCofounder:0.0, isPersonAndExecutive:1.0, matchTitle:0.0, popularity:0.9" }
![Page 48: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/48.jpg)
LTR PLUGIN: MODEL { "type": "org.apache.solr.ltr.ranking.LambdaMARTModel", "name": "mymodel1", "features": [ { "name": "matchedTitle"}, { "name": "isPersonAndExecutive"} ], "params": { "trees": [ { "weight": 1, "tree": { "feature": "matchedTitle", "threshold": 0.5, "left": { "value": -‐100 }, "right": { "feature": "isPersonAndExecutive", "threshold": 0.5, "left": { "value": 50 }, "right": { "value": 75 } } } } ] } }
![Page 49: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/49.jpg)
LTR PLUGIN: MODEL ● ModelStore is also a Solr Managed Resource
● Deploy ─ curl -XPUT 'http://yoursolrserver/solr/collection/config/mstore'
--data-binary @./model.json -H 'Content-type:application/json'
● View ─ http://yoursolrserver/solr/collection/config/mstore
● Inherit from the model class for new scoring algorithms ─ score()
─ explain()
![Page 50: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/50.jpg)
LTR PLUGIN: EVALUATION ● Offline Metrics
─ nDCG increased approximately 10% after reranking
● Online Metrics ─ Clicks @ 1 up by approximately 10%
![Page 51: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/51.jpg)
BEFORE AND AFTER Query: “unemployment” Solr Ranking Machine Learned Reranking
![Page 52: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/52.jpg)
LTR PLUGIN: EVALUATION ● Offline Metrics
─ nDCG increased approximately 10% after reranking
● Online Metrics ─ Clicks @ 1 up by approximately 10%
● Performance ─ About 30% faster than previous external ranking system
10 million documents in collection 100k queries 1k features 1k documents/query reranked
![Page 53: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/53.jpg)
LTR PLUGIN: BENEFITS ● Simpler feature engineering, without compiling
● Access to rich internal Solr search functionality for feature building
● Search result relevancy improvements vs regular Solr relevance
● Automatic relevancy tuning
● Compatible scores across collections
● Performance benefits vs external ranking system
![Page 54: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/54.jpg)
FUTURE WORK ● Continue work to open source the plugin
● Support pipelining multiple reranking models
● Allow a simple ranking model to be used in the first pass
![Page 55: Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP](https://reader031.vdocuments.mx/reader031/viewer/2022030305/5870672e1a28ab48378b52ef/html5/thumbnails/55.jpg)
QUESTIONS?