new directions in mahout's recommenders
DESCRIPTION
TRANSCRIPT
New Directions in Mahout’s RecommendersSebastian Schelter, Apache Software FoundationRecommender Systems Get-together Berlin
New
Directions
inM
ahout’sRecom
menders
2/28
New Directions?
Mahout in Action is the prime source ofinformation for using Mahout in practice.
As it is more than two years old, itis missing a lot of recent developments.
This talk describes what has been added to the recommendersof Mahout since then.
Single machine recommenders
New
Directions
inM
ahout’sRecom
menders
4/28
MyMedialite, scientific library of recom-mender system algorithms
Mahout now features a couple of popular latent factor models,mostly ported by Zeno Gantner.
New
Directions
inM
ahout’sRecom
menders
5/28
New recommenders and factorizers
BiasedItemBasedRecommender, item-based kNN withuser-item-bias estimationKoren: Factor in the Neighbors: Scalable and Accurate Collaborative Filtering, TKDD ’09
RatingSGDFactorizer, biased matrix factorizationKoren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09
SVDPlusPlusFactorizer, SVD++Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08
ALSWRFactorizer, matrix factorization using AlternatingLeast SquaresZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08
Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
New
Directions
inM
ahout’sRecom
menders
6/28
Batch Item-Similarities on a single machine
Simple but powerful way to deploy Mahout: Use item-basedcollaborative filtering with periodically precomputed itemsimilarities.
Mahout now supports multithreaded item similaritycomputation on a single machine for data sizes that don’trequire a Hadoop-based solution.
DataModel dataModel = new FileDataModel(new File(”movielens.csv”));ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel));ItemBasedRecommender recommender =
new GenericItemBasedRecommender(dataModel, similarity);BatchItemSimilarities batch =
new MultithreadedBatchItemSimilarities(recommender, k);batch.computeItemSimilarities(numThreads, maxDurationInHours,
new FileSimilarItemsWriter(resultFile));
Parallel processing
New
Directions
inM
ahout’sRecom
menders
8/28
Collaborative Filtering
idea: infer recommendations from patterns found in thehistorical user-item interactions
data can be explicit feedback (ratings) or implicit feedback(clicks, pageviews), represented in the interaction matrix A
item1 · · · item3 · · ·
user1 3 · · · 4 · · ·user2 − · · · 4 · · ·user3 5 · · · 1 · · ·· · · · · · · · · · · · · · ·
row ai denotes the interaction history of user i
we target use cases with millions of users and hundreds ofmillions of interactions
New
Directions
inM
ahout’sRecom
menders
9/28
MapReduce
I paradigm for data-intensive parallel processingI data is partitioned in a distributed file systemI computation is moved to dataI system handles distribution, execution, scheduling, failuresI fixed processing pipeline where user specifies two
functionsmap : (k1, v1)→ list(k2, v2)
reduce : (k2, list(v2))→ list(v2)
DFS
Input
Input
Input
map
map
map
reduce
reduce
DFS
Output
Output
shu
ffle
Scalable neighborhood methods
New
Directions
inM
ahout’sRecom
menders
11/28
Neighborhood MethodsItem-Based Collaborative Filtering is one of the mostdeployed CF algorithms, because:
I simple and intuitively understandableI additionally gives non-personalized, per-item
recommendations (people who like X might also like Y)I recommendations for new users without model retrainingI comprehensible explanations (we recommend Y because
you liked X)
New
Directions
inM
ahout’sRecom
menders
12/28
Cooccurrences
start with a simplified view:imagine interaction matrix A wasbinary
→ we look at cooccurrences only
item similarity computation becomes matrix multiplication
ri = (A>A) ai
scale-out of the item-based approach reduces to finding anefficient way to compute the item similarity matrix
S = A>A
New
Directions
inM
ahout’sRecom
menders
13/28
Parallelizing S = A>A
standard approach of computing item cooccurrences requiresrandom access to both users and items
foreach item f doforeach user i who interacted with f do
foreach item j that i also interacted with doSfj = Sfj + 1
→ not efficiently parallelizable on partitioned data
row outer product formulation of matrix multiplication isefficiently parallelizable on a row-partitioned A
S = A>A =∑i∈A
aia>i
mappers compute the outer products of rows of A, emit theresults row-wise, reducers sum these up to form S
New
Directions
inM
ahout’sRecom
menders
14/28
Parallel similarity computationreal datasets not binary and we want to use a variety ofsimilarity measures, e.g. Pearson correlation
express similarity measures by 3 canonical functions, whichcan be efficiently embedded into the computation (cf.,VectorSimilarityMeasure)
I preprocess adjusts an item rating vector
f = preprocess( f ) j = preprocess( j )
I norm computes a single number from the adjusted vector
nf = norm( f ) nj = norm( j )
I similarity computes the similarity of two vectors from thenorms and their dot product
Sfj = similarity( dotfj , nf , nj )
New
Directions
inM
ahout’sRecom
menders
15/28
Example: Jaccard coefficient
I preprocess binarizes the rating vectors
if =
3−5
j =
441
f = bin(f ) =
101
j = bin(j) =
111
I norm computes the number of users that rated each item
nf = ‖ f ‖1 = 2 nj = ‖ j ‖1 = 3
I similarity finally computes the jaccard coefficient fromthe norms and the dot product of the vectors
jaccard(f , j) = |f ∩ j ||f ∪ j | =
dotfjnf + nj − dotfj
=2
2 + 3− 2 =23
New
Directions
inM
ahout’sRecom
menders
16/28
Implementation in Mahout
o.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJobcomputes the top-k pairwise similarities for each row of amatrix using some similarity measure
o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJobcomputes the top-k similar items per item usingRowSimilarityJob
o.a.m.cf.taste.hadoop.item.RecommenderJobcomputes recommendations and similar items usingRowSimilarityJob
New
Directions
inM
ahout’sRecom
menders
17/28
MapReduce pass 1
I data partitioned by items (row-partitioned A>)I invokes preprocess and norm for each item vectorI transposes input to form A
reduce shuffle combine map
1----
1----
-1---
21---
1,
2,
2,
-1,
--1--
--1--
--1--
---1-
0,
1,
2,
1,
---1-
----1
--321
2,
0,
-1,
1----
11---
21---
1,
2,
-1,
--11-
11---
--11-
,
2, ,
--1-1
1----
0,
1,
21---
--321
-1, ,
0 1 2 3 4
0 - - 1 - 1
1 1 - 1 1 -
2 1 1 1 1 -
binarized A pointing from users to items
AT pointing from items to users
21321
item „norms“
0 1 2
0 - 1 2
1 - - 1
2 3 1 5
3 - 2 4
4 1 - -
--1-1
--11-
--11-
0,
1,
2,
--321 -1,
New
Directions
inM
ahout’sRecom
menders
18/28
MapReduce pass 2
I data partitioned by users (row-partitioned A)I computes dot products of columnsI loads norms and invokes similarityI implementation contains several optimizations
(sparsification, exploit symmetry and thresholds)
reduce shuffle combine map
0 1 2 3 4
0 - - 1 - 1
1 1 - 1 1 -
2 1 1 1 1 -
-122-
--11-
---2-
0,
1,
2,
binarized A
----1 2,
--11-
---1-
-111-
--11-
0,
2,
0,
1,
---1- 2,
----1 2,
-122-
--11-
0,
1,
---2-
, ----1 2,
0 1 2 3 4
0 - 1 2 2 3 1 -
1 - - 1 3 1 2 -
2 - - - 2 3 1 3
3 - - - - -
4 - - - - -
“ATA“ holding item similarities
21321
item „norms“
New
Directions
inM
ahout’sRecom
menders
19/28
Cost of the algorithm
major cost in our algorithm is the communication in thesecond MapReduce pass: for each user, we have to process thesquare of the number of his interactions
S =∑i∈A
aia>i
→ cost is dominated by the densest rows of A(the users with the highest number of interactions)
distribution of interactions per user is usually heavy tailed→ small number of power users with an unproportionallyhigh amount of interactions drastically increase the runtime
I if a user has more than p interactions, only use a randomsample of size p of his interactions
I saw negligible effect on prediction quality for moderate p
New
Directions
inM
ahout’sRecom
menders
20/28
Scalable Neighborhood Methods: Experiments
Setup
I 26 machines running Java 7 and Hadoop 1.0.4I two 4-core Opteron CPUs, 32 GB memory and four 1 TB
disk drives per machine
Results
Yahoo Songs dataset (700M datapoints, 1.8M users, 136Kitems), 26 machines, similarity computation takes less than 40minutes
Scalable matrix factorization
New
Directions
inM
ahout’sRecom
menders
22/28
Latent factor models: idea
interactions are deeply influenced by a set of factors that arevery specific to the domain (e.g. amount of action orcomplexity of characters in movies)
these factors are in general not obvious, we might be able tothink of some of them but it’s hard to estimate their impacton the interactions
need to infer those so called latent factors from theinteraction data
New
Directions
inM
ahout’sRecom
menders
23/28
low-rank matrix factorization
approximately factor A into the product of two rank r featurematrices U and M such that A ≈ UM.
U models the latent features of the users, M models the latentfeatures of the items
dot product u>i mj in the latent feature space predicts strength
of interactions between user i and item j
to obtain a factorization, minimize regularized squared errorover the observed interactions, e.g.:
minU,M
∑(i ,j)∈A
(aij − u>i mj)
2 + λ
∑i
nui
∥∥∥ui∥∥∥2
+∑
jnmj
∥∥∥mj∥∥∥2
New
Directions
inM
ahout’sRecom
menders
24/28
Alternating Least Squares
ALS rotates between fixing U and M. When U is fixed, thesystem recomputes M by solving a least-squares problem peritem, and vice versa.
easy to parallelize, as all users (and vice versa, items) can berecomputed independently
additionally, ALS is able to solve non-sparse models fromimplicit data
≈ ×
Au × i
Uu × k
Mk × i
New
Directions
inM
ahout’sRecom
menders
25/28
Implementation in Mahout
o.a.m.cf.taste.hadoop.als.ParallelALSFactorizationJobcomputes a factorization using Alternating Least Squares, hasdifferent solvers for explicit and implicit dataZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08
Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
o.a.m.cf.taste.hadoop.als.FactorizationEvaluator computesthe prediction error of a factorization on a test set
o.a.m.cf.taste.hadoop.als.RecommenderJob computesrecommendations from a factorization
New
Directions
inM
ahout’sRecom
menders
26/28
Scalable Matrix Factorization: ImplementationRecompute user feature matrix U using a broadcast-join:
1. Run a map-only job using multithreaded mappers2. load item-feature matrix M into memory from HDFS to
share it among the individual mappers3. mappers read the interaction histories of the users4. multithreaded: solve a least squares problem per user to
recompute its feature vector
user histories A user features U
item features M
MapHash-Join + Re-computation
local fw
dlo
cal fwd
local fw
d
MapHash-Join + Re-computation
MapHash-Join + Re-computation
broadcast
mac
hin
e 1
mac
hin
e 2
mac
hin
e 3
New
Directions
inM
ahout’sRecom
menders
27/28
Scalable Matrix Factorization: Experiments
Setup
I 26 machines running Java 7 and Hadoop 1.0.4I two 4-core Opteron CPUs, 32 GB memory and four 1 TB
disk drives per machineI configured Hadoop to reuse JVMs, ran multithreaded
mappers
Results
Yahoo Songs dataset (700M datapoints), 26 machines, singleiteration (two map-only jobs) takes less than 2 minutes
Thanks for listening!Follow me on twitter at http://twitter.com/sscdotopen
Join Mahout’s mailinglists at http://s.apache.org/mahout-lists
picture on slide 3 by Tim Abott, http://www.flickr.com/photos/theabbott/picture on slide 21 by Crimson Diabolics, http://crimsondiabolics.deviantart.com/