mahout tutorial and hands-on (version 2015)
TRANSCRIPT
![Page 1: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/1.jpg)
Apache Mahout – Tutorial (2015) Cataldo Musto, Ph.D.
Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale
Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2014/2015
07/01/2015
1
![Page 2: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/2.jpg)
Outline
• What is Mahout ?
– Overview
• How to use Mahout ?
– Hands-on session
2
![Page 3: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/3.jpg)
Part 1
What is Mahout?
3
![Page 4: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/4.jpg)
What is (a) Mahout?
4
an elephant driver
![Page 5: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/5.jpg)
• Mahout is a Java library
– Implementing Machine Learning techniques
5
What is (a) Mahout?
![Page 6: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/6.jpg)
• Mahout is a Java library
– Implementing Machine Learning techniques
• Clustering
• Classification
• Recommendation
• Frequent ItemSet
6
What is (a) Mahout?
![Page 7: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/7.jpg)
• Mahout is a Java library
– Implementing Machine Learning techniques
• Clustering
• Classification
• Recommendation
• Frequent ItemSet (removed)
7
What is (a) Mahout?
![Page 8: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/8.jpg)
What can we do?
• Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries
to find items users might like.
– Clustering - takes e.g. text documents and groups them into groups of topically related documents.
– Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.
8
![Page 9: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/9.jpg)
Why Mahout?
• Mahout is not the only ML framework
– Weka
– R (http://www.r-project.org/)
• Why do we prefer Mahout?
(http://www.cs.waikato.ac.nz/ml/weka/)
9
![Page 10: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/10.jpg)
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
10
![Page 11: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/11.jpg)
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
–Scalable
11
![Page 12: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/12.jpg)
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
–Scalable • Based on Hadoop (not mandatory!)
12
![Page 13: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/13.jpg)
Why do we need a scalable framework?
Big Data!
13
![Page 14: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/14.jpg)
When do we need a scalable framework? (e.g. Recommendation Task)
Over 100m user-preferences connections
14
http://mahout.apache.org/users/recommender/recommender-first-timer-faq.html
![Page 15: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/15.jpg)
Use Cases
15
![Page 16: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/16.jpg)
Use Cases
Recommendation Engine on Foursquare 16
![Page 17: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/17.jpg)
Use Cases
User Interest Modeling on Twitter 17
![Page 18: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/18.jpg)
Use Cases
Pattern Mining on Yahoo! (as anti-spam) 18
![Page 19: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/19.jpg)
Algorithms
• Recommendation
– User-based Collaborative Filtering
– Item-based Collaborative Filtering
– Matrix Factorization-based CF
• Several factorization techniques
19
![Page 20: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/20.jpg)
Algorithms
• Clustering
– K-Means
– Fuzzy K-Means
– Streaming K-Means
– etc.
• Topic Modeling
– LDA (Latent Dirichlet Allocation)
20
![Page 21: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/21.jpg)
Algorithms
• Classification
– Logistic Regression
– Bayes (only on Hadoop, from release 0.9)
– Random Forests (only on Hadoop, from release 0.9)
– Hidden Markov Models
– Perceptrons
21
![Page 22: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/22.jpg)
Algorithms
• Other
– Text processing
• Creation of sparse vectors from text
– Dimensionality Reduction techniques
• Principal Component Analysis (PCA)
• Singular Value Decomposition (SVD)
– (and much more.)
22
![Page 23: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/23.jpg)
Mahout in the Apache Software Foundation
23
![Page 24: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/24.jpg)
Mahout in the Apache Software Foundation
Original Mahout Project
24
![Page 25: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/25.jpg)
Mahout in the Apache Software Foundation
Taste: collaborative filtering framework
25
![Page 26: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/26.jpg)
Mahout in the Apache Software Foundation
Lucene: information retrieval software library
26
![Page 27: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/27.jpg)
Mahout in the Apache Software Foundation
Hadoop: framework for distributed storage and programming based on MapReduce
27
![Page 28: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/28.jpg)
Next Releases: Watch out!
Hadoop is going to be replaced by Apache Spark
28
X
http://spark.apache.org
![Page 29: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/29.jpg)
General Architecture
Three-tiers architecture (Application, Algorithms and Shared Libraries)
29
![Page 30: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/30.jpg)
General Architecture
Business Logic
30
![Page 31: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/31.jpg)
General Architecture
Data Storage and Shared Libraries
31
![Page 32: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/32.jpg)
General Architecture
External Applications invoking Mahout APIs
32
![Page 33: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/33.jpg)
In this tutorial we will focus on
Recommendation
33
![Page 34: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/34.jpg)
Recommendation
• Mahout implements a Collaborative Filtering framework – Uses historical data (ratings, clicks, and purchases) to
provide recommendations • User-based: recommend items by finding similar users. This is
often harder to scale because of the dynamic nature of users; • Item-based: calculate similarity between items and make
recommendations. Items usually don't change much, so this often can be computed offline; – Popularized by Amazon and others
• Matrix factorization-based: split the original user-item matrix in smaller matrices in order to analyze item rating patterns and learn some latent factors explaining users’ behavior and items characteristics. – Popularized by Netflix Prize
34
![Page 35: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/35.jpg)
Recommendation - Architecture
35
![Page 36: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/36.jpg)
Recommendation Workflow
Inceptive Idea:
A Java/J2EE application invokes a Mahout
Recommender whose DataModel is based on a set of User Preferences that are
built on the ground of a physical DataStore
36
![Page 37: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/37.jpg)
Physical Storage (database, files, etc.)
37
Recommendation Workflow
![Page 38: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/38.jpg)
Physical Storage (database, files, etc.)
Data Model
38
Recommendation Workflow
![Page 39: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/39.jpg)
Physical Storage (database, files, etc.)
Data Model
Recommender
39
Recommendation Workflow
![Page 40: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/40.jpg)
External Application
Physical Storage (database, files, etc.)
Data Model
Recommender
40
Recommendation Workflow
![Page 41: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/41.jpg)
Recommendation in Mahout
• Input: raw data (user preferences) • Output: preferences estimation
• Step 1
– Mapping raw data into a DataModel Mahout-compliant
• Step 2 – Tuning recommender components
• Similarity measure, neighborhood, etc.
• Step 3 – Computing rating estimations
• Step 4 – Evaluating recommendation
41
![Page 42: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/42.jpg)
Recommendation Components
• Mahout key abstractions are implemented through Java interfaces : – DataModel interface
• Methods for mapping raw data to a Mahout-compliant form
– UserSimilarity interface • Methods to calculate the degree of correlation between two users
– ItemSimilarity interface • Methods to calculate the degree of correlation between two items
– UserNeighborhood interface • Methods to define the concept of ‘neighborhood’
– Recommender interface • Methods to implement the recommendation step itself
42
![Page 43: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/43.jpg)
Recommendation Components
• Mahout key abstractions are implemented through Java interfaces :
– example: DataModel interface
• Each methods for mapping raw data to a Mahout-compliant form is an implementation of the generic interface
• e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database
• (and so on)
43
![Page 44: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/44.jpg)
Components: DataModel
• A DataModel is the interface to draw information about user preferences.
• Which sources is it possible to draw? – Database
• MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel
– External Files • FileDataModel
– Generic (preferences directly feed through Java code) • GenericDataModel
(They are all implementations of the DataModel interface)
44
![Page 45: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/45.jpg)
• GenericDataModel – Feed through Java calls
• FileDataModel – CSV (Comma Separated Values)
• JDBCDataModel – JDBC Driver – Standard database structure
Components: DataModel
45
![Page 46: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/46.jpg)
FileDataModel – CSV input
46
![Page 47: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/47.jpg)
Components: DataModel
• Regardless the source, they all share a common implementation.
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArray
47
![Page 48: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/48.jpg)
Components: DataModel
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArray
• Two implementations
– GenericUserPreferenceArray
• It stores numerical preference, as well.
– BooleanUserPreferenceArray
• It skips numerical preference values.
48
![Page 49: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/49.jpg)
Components: UserSimilarity
• UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity
between two Items.
• Which definition of similarity are available?
– Pearson Correlation – Spearman Correlation – Euclidean Distance – Tanimoto Coefficient – LogLikelihood Similarity
– Already implemented!
49
![Page 50: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/50.jpg)
Example: TanimotoDistance
50
![Page 51: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/51.jpg)
Example: CosineSimilarity
51
![Page 52: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/52.jpg)
Different Similarity definitions
influence neighborhood formation
52
![Page 53: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/53.jpg)
Pearson’s vs. Euclidean distance
53
![Page 54: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/54.jpg)
Pearson’s vs. Euclidean distance
54
![Page 55: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/55.jpg)
Pearson’s vs. Euclidean distance
55
![Page 56: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/56.jpg)
Components: UserNeighborhood
• Which definition of neighborhood are available? – Nearest N users
• The first N users with the highest similarity are labeled as ‘neighbors’
– Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’
– Already implemented!
56
![Page 57: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/57.jpg)
Components: Recommender
• Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item
• Which recommendation algorithms are implemented? – User-based CF
– Item-based CF
– SVD-based CF
(and much more…)
57
![Page 58: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/58.jpg)
Recap
• Many implementations of a CF-based recommender! – Different recommendation algorithms – Different neighborhood definitions – Different similarity definitions
• Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save
time in the evaluation of the different combinations of the parameters!
– Standard interface for the evaluation of a Recommender System
58
![Page 59: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/59.jpg)
Evaluation
• Mahout provides classes for the evaluation of a recommender system
– Prediction-based measures
• Mean Average Error
• RMSE (Root Mean Square Error)
– IR-based measures
• Precision, Recall, F1-measure, F1@n
• NDCG (ranking measure)
59
![Page 60: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/60.jpg)
Evaluation
• Prediction-based Measures
– Class: AverageAbsoluteDifferenceEvaluator
– Method: evaluate()
– Parameters:
• Recommender implementation
• DataModel implementation
• TrainingSet size (e.g. 70%)
• % of the data to use in the evaluation (smaller % for fast prototyping)
60
![Page 61: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/61.jpg)
Evaluation
• IR-based Measures
– Class: GenericRecommenderIRStatsEvaluator
– Method: evaluate()
– Parameters:
• Recommender implementation
• DataModel implementation
• Relevance Threshold (mean+standard deviation)
• % of the data to use in the evaluation (smaller % for fast prototyping)
61
![Page 62: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/62.jpg)
Part 2
How to use Mahout? Hands-on
62
![Page 63: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/63.jpg)
Download Mahout
• Download – The latest Mahout release is 0.9
– Available at: http://archive.apache.org/dist/mahout/0.9/mahout-distribution-0.9.zip
– Extract all the libraries and include them in a new NetBeans (Eclipse) project
• Requirement: Java 1.6.x or greater.
• Hadoop is not mandatory!
63
![Page 64: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/64.jpg)
Important
JavaDoc https://builds.apache.org/job/Mahout-Quality/javadoc/
64
![Page 65: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/65.jpg)
Exercise 1
• Create a Preference object
• Set preferences through some simple Java call
• Print some statistics about preferences
– How many preferences? On which items?
– Wheter a user has expressed preference on a certain item.
– Which one is the item with the highest score?
65
![Page 66: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/66.jpg)
Hints
• Hints about objects to be used:
– Preference
• Methods setUserId, setItemId, setValue;
– GenericUserPreferenceArray
• Dimension = number of preferences to be defined;
• Methods: getIds(), sortByValueReversed(),hasPrefWithItemId(id);
66
![Page 67: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/67.jpg)
Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);
user1Prefs.setUserID(0, 1L);
user1Prefs.setItemID(0, 101L);
user1Prefs.setValue(0, 2.0f);
user1Prefs.setItemID(1, 102L);
user1Prefs.setValue(1, 3.0f);
Preference pref = user1Prefs.get(1);
System.out.println(pref);
}
} 67
![Page 68: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/68.jpg)
Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);
user1Prefs.setUserID(0, 1L);
user1Prefs.setItemID(0, 101L);
user1Prefs.setValue(0, 2.0f);
user1Prefs.setItemID(1, 102L);
user1Prefs.setValue(1, 3.0f);
Preference pref = user1Prefs.get(1);
System.out.println(pref);
}
}
Score 2 for Item 101
68
![Page 69: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/69.jpg)
Exercise 2
• Create a DataModel
• Feed the DataModel through some simple Java calls
• Print some statistics about data (how many users, how many items, maximum ratings, etc.)
69
![Page 70: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/70.jpg)
Exercise 2
• Hints about objects to be used: – FastByIdMap
• PreferenceArray stores the preferences of a single user • Where do the preferences of all the users are stored?
– An HashMap? No. – Mahout introduces data structures optimized for
recommendation tasks – HashMap are replaced by FastByIDMap
– Model • Methods: getNumItems(),
getNumUsers(),getMaxPreference() • General statistics about the model.
70
![Page 71: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/71.jpg)
Exercise 2: data model import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreateGenericDataModel {
private CreateGenericDataModel() {
}
public static void main(String[] args) {
FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();
PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10);
prefsForUser1.setUserID(0, 1L);
prefsForUser1.setItemID(0, 101L);
prefsForUser1.setValue(0, 3.0f);
prefsForUser1.setItemID(1, 102L);
prefsForUser1.setValue(1, 4.5f);
preferences.put(1L, prefsForUser1);
DataModel model = new GenericDataModel(preferences);
System.out.println(model);
}
}
71
![Page 72: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/72.jpg)
Exercise 3
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
72
![Page 73: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/73.jpg)
Exercise 3
• Hints about objects to be used:
– FileDataModel
• Argument: new File with the path of the CSV
– PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc.
• Argument: the model
73
![Page 74: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/74.jpg)
Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
}
74
![Page 75: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/75.jpg)
Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
} FileDataModel
75
![Page 76: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/76.jpg)
Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
} Similarity Definitions
76
![Page 77: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/77.jpg)
Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
} Output
77
![Page 78: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/78.jpg)
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
• Generate neighboorhood
• Generate recommendations
78
![Page 79: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/79.jpg)
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
• Generate neighboorhood
• Generate recommendations
– Compare different combinations of parameters!
79
![Page 80: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/80.jpg)
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
• Generate neighboorhood
• Generate recommendations
– Compare different combinations of parameters!
80
![Page 81: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/81.jpg)
Exercise 4
• Hints about objects to be used:
– NearestNUserNeighborhood
– GenericUserBasedRecommender
• Parameters: – data model already shown
– Neighborhood
» Class: NearestNUserNeighborhood(n,similarity,model)
» Class: ThresholdUserNeighborhood(thr,similarity,model)
– similarity measure already shown
81
![Page 82: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/82.jpg)
Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
} 82
![Page 83: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/83.jpg)
Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
FileDataModel
83
![Page 84: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/84.jpg)
Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
2 neighbours
84
![Page 85: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/85.jpg)
Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Top-1 Recommendation for User 1 85
![Page 86: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/86.jpg)
• Download the GroupLens dataset (100k) – Its format is already Mahout compliant
– http://files.grouplens.org/datasets/movielens/ml-100k.zip
• Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset
• Next: now we can run the recommendation framework against a state-of-the-art dataset
Exercise 5: MovieLens Recommender
86
![Page 87: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/87.jpg)
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 20);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 5: MovieLens Recommender
87
![Page 88: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/88.jpg)
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(10, 50);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 5: MovieLens Recommender
We can play with parameters! 88
![Page 89: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/89.jpg)
Exercise 5: MovieLens Recommender
• Analyze Recommender behavior with different combinations of parameters
– Do the recommendations change with a different similarity measure?
– Do the recommendations change with different neighborhood sizes?
– Which one is the best one?
• …. Let’s go to the next exercise!
89
![Page 90: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/90.jpg)
• Evaluate different CF recommender configurations on MovieLens data
• Metrics: RMSE, MAE, Precision
Exercise 6: Recommender Evaluation
90
![Page 91: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/91.jpg)
• Evaluate different CF recommender configurations on MovieLens data
• Metrics: RMSE, MAE
• Hints: useful classes – Implementations of RecommenderEvaluator
interface • AverageAbsoluteDifferenceRecommenderEvaluator
• RMSRecommenderEvaluator
Exercise 6: Recommender Evaluation
91
![Page 92: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/92.jpg)
• Further Hints: – Use RandomUtils.useTestSeed()to ensure
the consistency among different evaluation runs
– Invoke the evaluate() method
• Parameters
– RecommenderBuilder: recommender instance (as in previous exercises.
– DataModelBuilder: specific criterion for training
– Split Training-Test: double value (e.g. 0.7 for 70%)
– Amount of data to use in the evaluation: double value (e.g 1.0 for 100%)
Exercise 6: Recommender Evaluation
92
![Page 93: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/93.jpg)
Example 6: evaluation class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
Ensures the consistency between different evaluation runs.
93
![Page 94: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/94.jpg)
Exercise 6: evaluation class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
94
![Page 95: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/95.jpg)
Exercise 6: evaluation class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
70%training (whole dataset evaluation)
95
![Page 96: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/96.jpg)
Exercise 6: evaluation class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
Recommendation Engine
96
![Page 97: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/97.jpg)
Example 6: evaluation class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
} We can add more measures
97
![Page 98: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/98.jpg)
Exercise 6: evaluation class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse);
}
} 98
![Page 99: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/99.jpg)
Exercise 7: item-based recommender
• Mahout provides Java classes for building an item-based recommender system
– Amazon-like
– Recommendations are based on similarities among items (generally pre-computed offline)
– Evaluate it with the MovieLens dataset!
99
![Page 100: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/100.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
Exercise 7: item-based recommender
100
![Page 101: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/101.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
} ItemSimilarity
Example 7: item-based recommender
101
![Page 102: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/102.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
} No Neighborhood definition for item-based recommenders
Example 7: item-based recommender
102
![Page 103: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/103.jpg)
Exercise 8: MF-based recommender
• Mahout provides Java classes for building an a CF recommender system based on state-of-the-art matrix factorization techniques – Class: SVDRecommender
• Parameters: DataModel, Factorizer
• Factorizer: a factorization algorithm – Alternating Least Squares (ALSWRFactorizer)
– SVD++ (SVDPlusPlusFactorizer)
– Stochastic Gradient Descent (ParallelSGDFactorizer)… etc
– Several parameters to tune!
– Evaluate it with the MovieLens dataset!
103
![Page 104: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/104.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60);
return new SVDRecommender(model, factorizer);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
104
Exercise 8: MF-based recommender
![Page 105: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/105.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60);
return new SVDRecommender(model, factorizer);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
105
Exercise 8: MF-based recommender
Hyperparameters: latent factors, lambda, iterations
![Page 106: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/106.jpg)
Mahout Strengths
• Fast-prototyping and evaluation
– To evaluate a different configuration of the same algorithm we just need to update a parameter and run again.
– Example
• Different Neighborhood Size
• Different similarity measures, etc.
106
![Page 107: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/107.jpg)
5 minutes to look for the best configuration
107
![Page 108: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/108.jpg)
• Evaluation of CF algorithms through IR measures
• Metrics: Precision, Recall
Exercise 9: Recommender Evaluation
108
![Page 109: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/109.jpg)
• Evaluation of CF algorithms through IR measures
• Metrics: Precision, Recall
• Hints: useful classes
– GenericRecommenderIRStatsEvaluator
– Evaluate() method
• Same parameters of exercise 6 and 7
Exercise 9: Recommender Evaluation
109
![Page 110: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/110.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Precision@5 , Recall@5, etc. 110
![Page 111: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/111.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Precision@5 , Recall@5, etc. 111
![Page 112: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/112.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Set Neighborhood to 500 112
![Page 113: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/113.jpg)
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Set Euclidean Distance 113
![Page 114: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/114.jpg)
• Write a class that automatically runs evaluation with different parameters
– e.g. fixed neighborhood sizes from an Array of values
– Print the best scores and the configuration
Exercise 10: Recommender Evaluation
114
![Page 115: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/115.jpg)
• Find the best configuration for several datasets
– Download datasets from http://mahout.apache.org/users/basics/collections.html
–Write classes to transform input data in a Mahout-compliant form
– Extend exercise 10!
Exercise 11: more datasets!
115
![Page 116: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/116.jpg)
End. Do you want more?
116
![Page 117: Mahout Tutorial and Hands-on (version 2015)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/55a2cc811a28ab422e8b45cd/html5/thumbnails/117.jpg)
Do you want more?
• Recommendation – Deploy of a Mahout-based Web Recommender
– Integration with Hadoop/Spark
– Integration of content-based information
– Custom similarities, Custom recommenders, Re-scoring functions
• Content-based Recommender Systems through Classification Algorithms!
117