The YouTube Video Recommendation System
James Davidson Benjamin Liebald Junning LiuPalash NandyTaylor Van Vleet(Google inc)
Presented by Thuat Nguyen
Introduction
• YouTube – the most popular video community
• 1 billion users watch each month
• 24 hours of video uploaded every minute (2010)
• It’s a very information-rich environment
Goals
• The recommendation system • Find videos related to users’ interests• Helps users discover• Keep users engaged: not just to watch or find
Challenges
• Videos have no or poor metadata
• User interactions are relatively short and noisy
(compared to Netflix or Amazon)
• Videos usually have short life cycle
System Design
1. Input data
2. Related videos
3. Generating recommendation candidates
4. Ranking
5. System implementation
-> recent, fresh, diverse, relevant
Input Data
• Two main classes of data:
1. Content data
• Title, description…
2. User activity data
• Rating, liking, subscribing, etc. (explicit)
• Start to watch, close before finish (implicit)
Related Videos
• Relatedness score
• Normalization function
• vi -> Ri of top N candidates (impose min score)
Generating Recommendation Candidates
• Seed set S• C1 is narrow
• Broad the diversity of candidate set
Generating Recommendation Candidates (cont.)
Ranking
• Candidates ranked by using categorized signals:
• Video quality (view count, ratings…)
• User specificity (user’s taste and preferences)
• Diversification
• Impose constraints for each seed
System Implementation
• Three main steps:
• Data collection (log files)
• Recommendation generation (MapReduce)
• Recommendation serving
• Batch-oriented pre-computation approach
• Take advantages of CPU resources
• Cause delay between generating and serving
Evaluation and Results
Questions?