get on with it! - dai-labor · get on with it! recommender system industry challenges move towards...
TRANSCRIPT
![Page 1: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/1.jpg)
Get on with it!
Recommender system industry
challenges move towards real-world,
online evaluation
Padova – March 23th, 2016
Andreas Lommatzsch - TU Berlin, Berlin, Germany
Jonas Seiler - plista, Berlin, Germany
Daniel Kohlsdorf - XING, Hamburg, Germany
CrowdRec - www.crowdrec.eu
![Page 3: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/3.jpg)
• s
Jonas Seiler
http://www.plista.com
![Page 5: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/5.jpg)
Where are recommender
system challenges headed?
Direction 1:
Use info beyond the user-
item matrix.
Direction 2:
Online evaluation +
multiple metrics.
Moving towards real-world evaluation
Flickr credit: rodneycampbell
![Page 6: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/6.jpg)
Why evaluate?
<Images showing “our” use cases>
● plista
● Improve results algorithms
● handle technical constraints
● User Satisfaction
• Evaluation is crucial for the success of real-life systems
• How should we evaluate?
● Improve user satisfaction
● Increase sales, earnings
● Optimize the technical platform for providing the
service
Precision and
Recall
Technical
complexity
Influence
on sales
Required hardware
resources
Business
models
Scalability
Diversity of the
presented results
User
satisfaction
![Page 7: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/7.jpg)
Evaluation Settings
• A static collection of documents
• A set of queries
• A list of relevant documents defined by
experts for each query
Traditional Evaluation in IR
The Cranfield paradigm was designed in the early 1960s when
information access was via Boolean queries against manually indexed
documents and there was (virtually) no text online. Cyril Cleverdon,
Librarian of the College of Aeronautics, Cranfield, England, built a test
collection that modeled university researchers, including abstracts of
aeronautical papers, one-line queries based on questions gathered
from the researchers, and complete relevance judgments for each
query submitted by these users. The idea of carefully modeling some
user application continued with Prof. Gerard Salton and the SMART
collections, such as searching MEDLINE abstracts using real questions
submitted to MEDLINE, or searching full text TIME articles with real
questions from several sources, etc. A 1969 paper by Michael Lesk
and Salton used experiments on the ISPRA collection to show that
relevance judgments made by a person who was not the user would
still allow valid system comparison, a precursor to the paper by Ellen
Voorhees in SIGIR 1998.
IR based on static
collections
A set of queries. For each
query there is a list of
relevant documents
defined by experts
Reproducible setting
All researches have
exactly the same
information
“The Cranfield paradigm”
Advantages
• Reproducible setting
• All researches have exactly the same
information
• Optimized for measuring precision
Query0
* #nn
* #nn
* #nn
![Page 8: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/8.jpg)
Traditional Evaluation in IR
Weaknesses of traditional IR evaluation
• High costs for creating dataset
• Datasets are not up-to-date
• Domain-specific documents
• The expert-defined ground truth does not
consider individual user preferences
• Individual user preferences
• Context-awareness is not considered
• Technical aspects are ignored
Context is
everything
![Page 9: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/9.jpg)
Industry and recsys challenges
• Challenges benefit both industry and academic research.
• We look at how industry challenges have evolved since
the Netflix prize 2009.
![Page 10: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/10.jpg)
Traditional Evaluation in RecSys
Rating prediction
Cross-validation
Individual User prefences /
personalization
Large dataset
sparcity
Evaluation Settings
• Rating prediction on user-item matrices
• Large, sparse dataset
• Predict personalized ratings
• Cross-validation, RMSE
Advantages
• Reproducible setting
• Personalization
• Dataset is based on
real user ratings
“The Netflix paradigm”
![Page 11: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/11.jpg)
Traditional Evaluation in RecSys
Weaknesses of traditional Recommender evaluation
• Static data
• Only one type of data - only user ratings
• User ratings are noisy
• Temporal aspects tend to be ignored
• Context-awareness is not considered
• Technical aspects are ignored
Static data
Context is not taken into account
Crossvalidation does not match real-life settings
Why Netflix did not implement the winner https://www.techdirt.com/blog/innovation/articles/20120409/03
412518422/why-netflix-never-implemented-algorithm-that-
won-netflix-1-million-challenge.shtml
![Page 12: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/12.jpg)
Challenges of Developing Applications
Challenges
• Data streams - continuous changes
• Big data
• Combine knowledge from different sources
• Context-Awareness
• Users expect personally relevant results
• Heterogeneous devices
• Technical complexity, real-time requirements
![Page 13: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/13.jpg)
How to address these challenges in the Evaluation?
• Realistic evaluation setting
– Heterogeneous data sources
– Streams
– Dynamic user feedback
• Appropriate metrics
– Precision and User satisfaction
– Technical complexity
– Sales and Business models
• Online and Offline Evaluation
How to Setup a better Evaluation?
● Online Evaluation
● Consider the context
● Data streams
● Business model-oriented metrics
![Page 14: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/14.jpg)
Approaches for a better Evaluation
• News recommendations
@ plista
• Job recommendations
![Page 15: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/15.jpg)
The plista Recommendation Scenario
Setting
● 250 ms response time
● 350 Mio AI/day
● In 10 Countries
Challenges
● News change
continuously
● User do not log-in
explicitly
● Seasonality, context-
depend user
preferences
![Page 16: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/16.jpg)
Offline
• Cross-validation
– Metric Optimization Engine
(https://github.com/Yelp/MOE)
– Integration into Spark
• How well does it correlate with
Online Evaluation?
• Time Complexity
Evaluation @ plista
Online
• AB Tests
– Limited
• by Caching Memory
• Computational
Resources
– MOE*
![Page 17: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/17.jpg)
Offline
• Mean and variance estimation of parameter space with
Gaussian Process
• Evaluate parameter with highest Expected Improvement (EI),
Upper Confidence Interval ….
• Rest API
Evaluation using MOE
![Page 18: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/18.jpg)
Online
• A/B Tests are expensive
• Model non-stationarity
• Integrate out non-stationarity
to get mean EI
Evaluation using MOE
![Page 19: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/19.jpg)
Provide an API enabling researchers testing own ideas
• The CLEF-NewsREEL challenge
• A Challenge in CLEF (Conferences and Labs of the Evaluation Forum)
• 2 Tasks: Online and Offline Evaluation
The CLEF-NewsREEL challenge
![Page 20: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/20.jpg)
How does the challenge work?
• Live streams consisting of impressions, requests, and
clicks, 5 publishers, approx 6 Million messages per day
• Technical requirements: 100 ms per request
• Live evaluation
based on CTR
CLEF-NewsREEL
Online Task
![Page 21: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/21.jpg)
Online vs. Offline Evaluation
• Technical aspects can be evaluated without user feedback
• Analyze the required resources and the response time
• Simulate the online evaluation by replaying a recorded
stream
CLEF-NewsREEL
Offline Task
![Page 22: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/22.jpg)
Challenge
• Realistic simulation of streams
• Reproducible setup of computing environments
Solution
• A framework simplifying
the setup of the evaluation
environment
• The Idomaar framework developed in the CrowdRec project
CLEF-NewsREEL
Offline Task
http://rf.crowdrec.eu
![Page 23: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/23.jpg)
More Information
• SIGIR forum Dec 2015 (Vol 49, #2)
http://sigir.org/files/forum/2015D/p129.pdf
Evaluate your algorithm online and offline in NewsREEL
• Register for the challenge!
http://crowdrec.eu/2015/11/clef-newsreel-2016/
(register until 22nd of April)
• Tutorials and Templates are provided at orp.plista.com
CLEF-NewsREEL
![Page 24: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/24.jpg)
https://recsys.xing.com/
XING - RecSys Challenge
![Page 25: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/25.jpg)
Job Recommendations @ XING
![Page 26: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/26.jpg)
XING - Evaluation based on interaction
● On Xing users can give feedback on recommendations.
● Number of user feedback way lower than implicit measures.
● A/B Tests focus on clickthrough rate.
![Page 27: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/27.jpg)
XING - RecSys Challenge, Scoring,
Space on Page
● Predict 30 items for each user.
● Score: weighted combination of
the precision
○ precisionAt(2)
○ precisionAt(4)
○ precisionAt(6)
○ precisionAt(20)
Top 6
![Page 28: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/28.jpg)
XING - RecSys Challenge, User Data
• User ID
• Job Title
• Educational Degree
• Field of Study
• Location
![Page 29: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/29.jpg)
XING - RecSys Challenge, User Data
• Number of past jobs
• Years of Experience
• Current career level
• Current discipline
• Current industry
![Page 30: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/30.jpg)
XING - RecSys Challenge, Item Data
• Job title
• Desired career level
• Desired discipline
• Desired industry
![Page 31: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/31.jpg)
XING - RecSys Challenge, Interaction Data
• Timestamp
• User
• Job
• Type:
– Deletion
– Click
– Bookmark
![Page 32: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/32.jpg)
XING - RecSys Challenge, Anonymization
![Page 33: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/33.jpg)
XING - RecSys Challenge, Anonymization
![Page 34: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/34.jpg)
XING - RecSys Challenge, Future
• Live Challenge
– Users submit predicted future interactions
– The solution is recommended on the platform
– Participants get points for actual user clicks
Release to Challenge Collect Clicks
Work On Predictions
Score
![Page 35: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/35.jpg)
How to setup a better Evaluation
• Consider different quality criteria
(prediction, technical, business models)
• Aggregate heterogeneous information sources
• Consider user feedback
• Use online and offline analyses
to understand users and their
requirements
Concluding ...
![Page 36: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/36.jpg)
Participate in challenges based on real-life scenarios
• NewsREEL challenge
Concluding ...
• RecSys 2016 challenge
=> Organize a challenge. Focus on real-life data.
http://orp.plista.com
http://2016.recsyschallenge.com/
![Page 37: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU](https://reader033.vdocuments.mx/reader033/viewer/2022042309/5ed617fcbcb22c51e2620cf1/html5/thumbnails/37.jpg)
More Information
• http://www.crowdrec.eu
• http://www.clef-newsreel.org
• http://orp.plista.com
• http://2016.recsyschallenge.com
• http://www.xing.com
Thank You