2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
DESCRIPTION
This session will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system is built using Mahout to do off-line analysis and Solr to provide real-time recommendations. The presentation will also include enough theory to provide useful working intuitions for those desiring to adapt this design. The entire system including a data generator, off-line analysis scripts, Solr configurations and sample web pages will be made available on github for attendees to modify as they like.TRANSCRIPT
SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Chief Data Engineer EMEA, MapR Technologies Twitter: @mhausenblas
What does Machine Learning look like?
A1 A2!"
#$TA1 A2
!"
#$=
A1T
A2T
!
"
%%
#
$
&&
A1 A2!"
#$
=A1
TA1 A1TA2
AT2A1 AT
2A2
!
"
%%
#
$
&&
r1r2
!
"%%
#
$&&=
A1TA1 A1
TA2
AT2A1 AT
2A2
!
"
%%
#
$
&&
h1h2
!
"%%
#
$&&
r1 = A1TA1 A1
TA2!"%
#$&h1h2
!
"%%
#
$&&
What does Machine Learning look like?
O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality
• Observation of interactions between users taking actions and items for input data to recommender model
• Goal: suggest additional appropriate or desirable interactions
• Example applications: – similar movie, music, books (topic, style, etc.) – map-based restaurant choices – suggesting sale items for e-stores or cash-register
receipts
Recommendations as Machine Learning
Recommendations
Recap: Behavior of a crowd helps us understand what individuals will do
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles
Recommendations
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
Alice got an apple and a puppy
Recommendations
What else would Bob like?
Alice
Bob
Charles
?
Recommendations
A puppy, of course!
Alice
Bob
Charles
You get the idea of how recommenders work …
Recommendations
What if everybody gets a pony?
?
Alice
Bob
Charles
Amelia What else would you recommend for Amelia?
Recommendations
?
Alice
Bob
Charles
Amelia If everybody gets a pony, it’s not a very good indicator of what to else predict ...
• Very popular items co-occur with everything – Examples: welcome document; elevator music
• Very widespread occurrence is not interesting as a way to generate indicators
– Unless you want to offer an item that is constantly desired, such as razor blades
• What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base
recommendation
Problems with Raw Co-occurrence
1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential
combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with
confidence be used as indicators of preference – RowSimilarityJob in Apache Mahout uses LLR
Get Useful Indicators from Behaviors
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1
Log Files and Dimensions
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1
t1
t2
t3
t4
Things
u1 Alice
Bob Charles
u3 u2
Users
History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔ ✔ ✔
✔ ✔
Co-occurrence Matrix: Items by Items
-‐
1 2 1 1
1 1
2 1
How do you tell which co-‐occurrences are useful?
0 0
0 0 Use LLR test to turn co-‐occurrence into indicators…
Co-occurrence Binary Matrix
1 1 not
not
1
Spot the Anomaly
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
What conclusion do you draw from each situa9on?
• Root LLR is roughly like standard deviations • In Apache Mahout, RowSimilarityJob uses LLR
Spot the Anomaly
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.90 1.95
4.52 14.3
Indicator Matrix: Anomalous Co-cccurrence
✔ ✔
Result: The marked row will be added to the indicator field in the item document …
Significant co-‐occurrences! indicators
Indicator Matrix
✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1)
That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommenda@on engine
Note: data for the indicator field is added directly to meta data for a document in Solr index. You don’t need to create a separate index for the indicators.
Demo time!
Internals of the Recommender Engine
27
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry”
Looking Inside LucidWorks
28
Metrics and logs (5)
Cooccurrence analysis (7)
Post to search
engine (8)
Search engine (4)
Presentation tier (2)
User behavior generator (1)
Session collector
(3)
History collector (6)
Diagnostic browsing (9)
http://bita.ly/18vbbaT
Example: search based recommendation
• Sample Query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local Top40
• Sample Document – Merchant Id – Field for text description – Phone – Address – Location
– Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local Top40
original data and meta-‐data
derived from co-‐occurrence analysis
recommendaRon query
Search-based recommendation
SolR Indexer SolR
Indexer Solr
indexing Co-‐occurrence
(Mahout)
Item meta-‐data Index shards
complete history
Analyze with MapReduce
SolR Indexer SolR
Indexer Solr
search Web Rer
Item meta-‐data Index shards
user history
Deploy with Conventional Search System
• Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo!
• Get in touch: Twitter—@mhausenblas, @MapR
• Ah, and, btw: we’re hiring ;)
Outro