elasticseach in outbrain recommender system - looking at content recommendations through a search...
TRANSCRIPT
Looking at Content Recommendation through a Search Lens
2People want GREAT
content
7
Content Recommendation EngineRelevance
Rec Engine
Content Inventory
8
Challenges
• Personalization
• A Jungle of Market RulesGeo targeting, publisher blacklisting of sites, URLs, titles
• Scale 35K req/sec, 50ms latency, millions of potential content recs
9
Search EnginesWhat can they do?
10
1. Score documents by relevance to query
Relevance
Query
Donald
Trump Search Engine
11
2. Filter documents by certain attributes
12
3. Work Efficiently and at Scale
13
3. Work Efficiently and at Scale
what the day brings
14
3. Work Efficiently and at Scale
what the day brings
15
3. Work Efficiently and at Scale
what the day brings
16
3. Work Efficiently and at Scale
Open Source
Distributed
Scalable
RESTful
Real-time search
17
3. Work Efficiently and at Scale
18
How Do we Reduce the Problem of Recommending Content to
Users to a Search Problem?
19
John, www.angelina.com
Television and Celebrities
Blacklist Site:www.brad.com
Translate user and context to a query of interests and market rules
20
Translate articles to searchable documents in the same feature space of user interests and market rules
Is about: Celebrities
site:www.brad.com
Breakup: What’s Next?
Brad's acting career
continues to flourish while he films a
new …
21
What is a Document About?
Semantic Features
CategoriesEntertainment/Television
TopicsStory, Murder, Television
EntitiesDolores, Westworld, HBO
NLP
22
Constructing a User Profile
Time
User Profile
23
User Profile
User Profile
25
26
27
28
29
30
31
32
33
Indexing Our Inventory to Elasticsearch Every ES document has one or more fields
Fields can be of different types
• Strings• Numeric• Boolean• Array of [stings | numbers | …]
34
Indexing Our Inventory to Elasticsearch Every article becomes an ES documentEvery article feature becomes a field{ "title" : "Westworld season 1 ends with explosive finale", "categories" : ["entertainment_television"], "topics" : ["story", "murder", "television"], "entities" : ["dolores", ”westworld", ”hbo"]}
Querying Elasticsearch
{ "query": { "filtered": { "query": { "term": {”category": ”celebrities" } },
”filter": { "term": {"site": "www.cnn.com" } } }}
36
{ "query": { "bool": { "should": [ {"terms":{ "categories": ["television", ”celebrities"]} }, {"terms":{ "topics": ["business", "cinema", "murder"]} },
{"terms":{ "entities": [”hbo", ”dolores", ”nyse"]} } ] } }}
Create Elasticsearch Query with User Interests
37
{ "query": { "bool": { "should": [ { "terms": { "categories": { "query": "television", "boost": 2.3 }}}, { "terms": { "categories": { "query": "investments", "boost": 1.6 }}}, { "terms": { "entities": { "query": ”dolores", "boost": 1.2 }}} ]}}}
Using Weights to Improve Relevance
38
{ "query": { "bool": { "should": [ {"terms":{ "categories": "?"}}, {"terms":{ "topics": "?"}},
{"terms":{ "entities": "?"}}}}}]
What about Cold-Start Users?
39
What about Cold-Start Users?
Display the most popular content
How? Index popularity score{ "title" : "Westworld season 1 ends ..", "categories" : ["entertainment_television"], "popularity" : 0.6}
{ "title" : "10 Best NY Resturants", "categories" : ["lifestyle/food"], "popularity" : 0.3}
40
What about Cold-Start Users?
Score by this field in the query
{ "query": { "function_score": { "query": { "match_all": {} }, "field_value_factor": { "field": "popularity" }, "boost_mode": "replace"}}}
41
Query with Blacklisted Sites
”www.angelina.com"
Blacklisted: ”www.brad.com”
From Market Rules to Elasticsearch Filters
42
Query with Blacklisted Sites
{ "must_not": [ { "terms": { "site": [
“www.brad.com”,]}}]}
www.angelina.com
43
{ "must_not": [ { "terms": { "site": [
“www.brad.com”,]}}]}
{ "title" : ”Breakup: what’s next?", ”site" : ”www.brad.com”}
Query with Blacklisted Sites
44
{ "must_not": [ { "terms": { "site": [
“www.brad.com”,]}}]}
{ "title" : ”Breakup: what’s next?", ”site" : ”www.brad.com”}
Document is Filtered
Out
Query with Blacklisted Sites
45
{ "title" : ”Breakup: what’s next?", ”site" : ”www.brad.com”}
{ "must_not": [ { "terms": { "site": [
“www.brad.com”,]}}]}
{ "title" : ”Top news of the week", ”site" : “www.cnn.com”}
Document Passes Filter
Query with Blacklisted Sites
46
From Market Rules to Elasticsearch FiltersGeo Targeting
”Music World – everything on NY Music Scene "
Targeting "US" users only
47
Index Geo Field in the Document
{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}
48
Add a Geo Filter to the Query
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }}}}
49
Apply Filter on Documents
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }
{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}
50
Apply Filter on Documents
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }
Document Passes Filter
{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}
51
Apply Filter on Documents
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["fr"]} }
{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}
52
Apply Filter on Documents
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["fr"]} }
Document is Filtered
Out
{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}
53
What about Documents Without a Specific Targeting?
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }
{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : [“"]}
54
What about Documents Without a Specific Targeting?
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }
Document is Filtered
Out
{ "title" :”Music Around the World“, "categories" : [”music"], "entities" : [”colplay", ”muse"], "geo" : [“"]}
55
Solution – Index & Query the Value "all"
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us", "all"]} }
{ "title" :”Music Around the World“, "categories" : [”music"], "entities" : [”colplay", ”muse"], "geo" : [“all"]}
56
{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us", "all"]} }
Solution – Index & Query the Value "all"
Document Passes Filter
{ "title" :”Music Around the World“, "categories" : [”music"], "entities" : [”colplay", ”muse"], "geo" : [“all"]}
57
Adding Domain Specific Functionality to Elasticsearch
58
Indexing Marketer Cost Per Click Without IndexingCPC values change rapidly
Limitation: you cannot update a document in Elasticseach
Requirement: to keep up with throughput index should be immutable
Solution: store & update CPCs in a separate off-heap storage
59
Writing a Custom Scoring Function
Combining high-granularity behavioral signals
Applying supervised learning models to compute scores
Use dynamic scripting (e.g Groovy)
OR
Use native Java via Elsaticseach plugins mechanism
Thank [email protected]