large-scale recommendations in a dynamic marketplace jay katukuri rajyashree mukherjee tolga konik...
TRANSCRIPT
1
Large-scale Recommendations in a Dynamic Marketplace
Jay KatukuriRajyashree Mukherjee
Tolga KonikChu-Cheng Hsieh
LSRS 2013
2
John is interested in an item: “iPhone 5 64gb white”, should we recommends– “iPhone 5 case”
(or)– “iPhone 5s gold”
Meet John Doe
LSRS 2013
3
Recommendation on e-marketplace
• Recommendation “before” purchase– iPhone 5S gold
• Recommendation “after” purchase– iPhone 5 case
Similar Item Recommendation (SIR)
Related Item Recommendation (RIR)
LSRS 2013
4
SIR- Example 1
LSRS 2013
5
SIR Example 2
LSRS 2013
Related Item Recommendation
6
Recommendations forXbox 360 4GB on Checkout page
LSRS 2013
7
Main Idea
• Similar Item Clustering (SIC)– Titles–Attributes (Price, etc.)– Images
• Recommendation– SIR: (same cluster)– RIR: (neighbor clusters)
LSRS 2013
8
Models
• Item clustersCluster represented by meaningful keywords– “clarks women shoe pumps classics”– “authentic handmade amish quilt”
• Cluster-Cluster Relations– “samsung galaxy s4” – “samsung galaxy s4 screen
protector”– “wolfgang puck electric pressure cooker” –
“kitchenaid food processor”
LSRS 2013
LSRS 2013 9
System Architecture - Overview
Inventory
Cluster-ClusterRelations
Transactions
Clusters
Conceptual Knowledgebase
Offline Model Generation The Data Store Real-time Performance System
Similar Items Recommender
(SIR)
Related Items Recommender
(RIR)
Clusters Model Generation
Related Clusters Model
Generation
Clickstream
Lost Item
Similar Items
?similarTo(item)
Bought Item
Related Items
?relatedTo(item)
10
Cluster Generation(offline)
LSRS 2013
11
Data on eBay
• Item-item co-occurrences on transaction logs• Large Data – Much bigger data set in both users and inventory
than other ecommerce sites.• Scale – More than 300M listings.– More than 10M new items every day
LSRS 2013
12
Challenges
• Global clustering not feasible• Size bias on different categories• Performance
LSRS 2013
13
Model Generation - Clusters
1. Select a few keyword to represents “big notions”, e.g. iPhone, Handbags, etc.– How to select?
2. Clustering by K-means– How to set K?
LSRS 2013
14
Model Generation - Clusters
new clustersitems user queries
concepts,categories
query-to-itemsQuery-Recall Generation
Cluster Generation
Clusters Model Generation
Data Store
Clusters
Inventory
Clickstream
Conceptual Knowledgebase
• Problem:Global clustering not feasible
• Solution:Partition input data by user queries
• Parallel distributed K-Means in Hadoop MapReduce
• Dedupe and merge overlapping clusters(100X reduction in size over inventory with over 90% coverage)
LSRS 2013
15
Base Cluster Generation
• Base Cluster ≡ Query• Find merge candidates based on query term
overlap– Eg: “nike airmax tennis shoes” -> “nike airmax”
• Score candidates using cosine similarity– Term weight : TF-IDF in the query
space(document=query)• TF : Query Demand• IDF : Number of Queries
LSRS 2013
16
Step 1: base cluster candidates
• Method for choosing the ``base clusters’’ (initial states):
– Minimum frequency– Supply threshold (Enough Inventory)– Min and max token constraint (Length of queries)– Heuristic constraints • Queries that have only numbers are not
allowed: “10 5”• …
–Merge similar clusters into one
LSRS 2013
17
candidates merge
• 4.34M base clusters merged into 1.95M• Example
phrase(hand,made) phrase(king,s) queen quiltphrase(hand,made) phrase(pink,s) quilt phrase(hand,made) phrase(prae,owned) queen quiltphrase(hand,made) queen quiltphrase(hand,made) phrase(prae,owned) quiltphrase(hand,made) quilt size twinphrase(hand,made) quilt silkphrase(hand,made) quilt twinphrase(hand,made) phrase(patch,work) quiltphrase(hand,made) quilt whitephrase(hand,made) phrase(king,size) quiltphrase(hand,made) phrase(yo,yo,s) quiltphrase(hand,made) quilt salephrase(hand,made) quilt red
phrase(hand,made) quilt
LSRS 2013
18
Step 2: K-Means Clustering
Split Clusters
Query to Items Data
Base Cluster Generation
K-Means Clustering of Base Clusters
Generate Item Features
Transaction Logs
Inventory Logs
Scoring Models
LSRS 2013
19
Clusters on Item Signature
apple ipod touch 4g clear film protector screen
Cluster
clarks women shoe pumps classics
LSRS 2013
20
Recommendation (online)
LSRS 2013
21
Performance System
Clusters InventoryConceptual Knowledgebase
?similarTo(item)
SIR query formation
Item
Sel
ecti
on
Cluster Assignment
SIR Ranking
items
Data Store
Lost Item Similar
Items
recommendations
Item Search
query
Clusters
Inventory
Conceptual Knowledgebase
?relatedTo(item)
Item
Sel
ecti
on
Cluster Assignment
RIR Ranking
items
Data Store
BoughtItem Related
Items
recommendations
Item Search
queriesRIR Query Formation
Cluster-ClusterRelations
clusters related
clusters
LSRS 2013
22
Items in the same cluster
LSRS 2013
23
Similar Item Recommendations
LSRS 2013
LSRS 2013 24
Experimental Results
• A/B Tests comparing against legacy systems– SIR legacy system
• Completely online• Naïve approach of using seed item title as a search query
– RIR legacy system• Chen, Y. and J.F. Canny, Recommending ephemeral items at web scale,
ACM SIGIR 2011• Collaborative Filtering on stable representations of items
– Significant improvements at 90% confidence interval• SIR resulted in 38.18% higher user engagement (CTR)• RIR resulted in 10.5% higher CTR• Statistically significant improvement in site-wide business metrics
from both SIR & RIR
LSRS 2013 25
Conclusion
• Balance between similarity and quality crucial in driving user engagement and conversion
• Clusters of similar items in the inventory– Local clustering in the coverage set of user queries
• Offline models built using Map-Reduce– Huge input datasets including inventory, clickstream
and transactional data• Efficient real-time performance system• Currently deployed on ebay.com
LSRS 2013 26
Acknowledgments
• Current & Past team members– Kranthi Chalasani – Santanu Kolay – Riyaaz Shaik – Venkat Sundaranatha
LSRS 2013 27
WE’RE HIRINGChu-Cheng Hsieh [email protected]