personalized food recommendations - ulisboa€¦ · personalized food recommendations exploring...

Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master of Science Degree in Telecommunications and Informatics Engineering Supervisors: Prof. Pável Pereira Calado Prof. Bruno Emanuel da Graça Martins Examination Committee Chairperson: Prof. Paulo Jorge Pires Ferreira Supervisor: Prof. Pável Pereira Calado Member of the Committee: Prof. João Miguel da Costa Magalhães November 2015

Upload: others

Post on 14-Jun-2020




0 download


Page 1: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Personalized Food RecommendationsExploring Content-Based Methods

Jorge Miguel Tavares Soares de Almeida

Thesis to obtain the Master of Science Degree in

Telecommunications and Informatics Engineering

Supervisors Prof Paacutevel Pereira CaladoProf Bruno Emanuel da Graccedila Martins

Examination Committee

Chairperson Prof Paulo Jorge Pires FerreiraSupervisor Prof Paacutevel Pereira Calado

Member of the Committee Prof Joatildeo Miguel da Costa Magalhatildees

November 2015



I would like to acknowledge a few people for their help and availability during the course of my M

Sc Dissertation

First I would like to thank my thesis dissertation supervisors Prof Pavel Calado and Prof Bruno

Martins for their guidance knowledge and constructive criticism which greatly improved the quality

of this work

I would also like to thank Prof Miguel Mira da Silva for the opportunity to be a part of the

YoLP project and for providing me with a research scholarship at INOV that supported my study of

recommendation systems in the food domain

Lastly I would like to thank my parents for their continued support throughout the years allowing

me to focus on my academic studies and on completing my Masterrsquos Degree




Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes

personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente

nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias

de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na

classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar

Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar

foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a

este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros

testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do


Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-

mida Receita Aprendizagem Autonoma




Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 2: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master



I would like to acknowledge a few people for their help and availability during the course of my M

Sc Dissertation

First I would like to thank my thesis dissertation supervisors Prof Pavel Calado and Prof Bruno

Martins for their guidance knowledge and constructive criticism which greatly improved the quality

of this work

I would also like to thank Prof Miguel Mira da Silva for the opportunity to be a part of the

YoLP project and for providing me with a research scholarship at INOV that supported my study of

recommendation systems in the food domain

Lastly I would like to thank my parents for their continued support throughout the years allowing

me to focus on my academic studies and on completing my Masterrsquos Degree




Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes

personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente

nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias

de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na

classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar

Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar

foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a

este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros

testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do


Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-

mida Receita Aprendizagem Autonoma




Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 3: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


I would like to acknowledge a few people for their help and availability during the course of my M

Sc Dissertation

First I would like to thank my thesis dissertation supervisors Prof Pavel Calado and Prof Bruno

Martins for their guidance knowledge and constructive criticism which greatly improved the quality

of this work

I would also like to thank Prof Miguel Mira da Silva for the opportunity to be a part of the

YoLP project and for providing me with a research scholarship at INOV that supported my study of

recommendation systems in the food domain

Lastly I would like to thank my parents for their continued support throughout the years allowing

me to focus on my academic studies and on completing my Masterrsquos Degree




Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes

personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente

nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias

de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na

classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar

Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar

foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a

este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros

testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do


Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-

mida Receita Aprendizagem Autonoma




Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 4: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master



Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes

personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente

nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias

de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na

classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar

Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar

foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a

este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros

testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do


Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-

mida Receita Aprendizagem Autonoma




Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 5: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes

personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente

nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias

de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na

classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar

Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar

foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a

este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros

testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do


Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-

mida Receita Aprendizagem Autonoma




Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 6: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master



Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 7: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Food recommendation is a relatively new area with few systems that focus on analysing user pref-

erences being deployed in real settings In my MSc dissertation the applicability of content-based

methods in personalized food recommendation is explored Variations of popular approaches used

in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide

personalized food recommendations With the objective of exploring content-based methods in this

area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to

this domain Besides the validation of the algorithm explored in this work other interesting tests

were also performed amongst them recipe feature testing the impact of the standard deviation in

the recommendation error and the algorithmrsquos learning curve

Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-

tion Recipe Machine Learning Feature Testing




Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 8: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master



Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 9: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Acknowledgments iii

Resumo v

Abstract vii

List of Tables xi

List of Figures xiii

Acronyms xv

1 Introduction 1

11 Dissertation Structure 2

2 Fundamental Concepts 3

21 Recommendation Systems 3

211 Content-Based Methods 4

212 Collaborative Methods 9

213 Hybrid Methods 12

22 Evaluation Methods in Recommendation Systems 14

3 Related Work 17

31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17

32 Content-Boosted Collaborative Recommendation 19

33 Recommending Food Reasoning on Recipes and Ingredients 21

34 User Modeling for Adaptive News Access 22

4 Architecture 25

41 YoLP Collaborative Recommendation Component 25

42 YoLP Content-Based Recommendation Component 27

43 Experimental Recommendation Component 28

431 Rocchiorsquos Algorithm using FF-IRF 28

432 Building the Usersrsquo Prototype Vector 29

433 Generating a rating value from a similarity value 29


44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 10: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

44 Database and Datasets 31

5 Validation 35

51 Evaluation Metrics and Cross Validation 35

52 Baselines and First Results 36

53 Feature Testing 38

54 Similarity Threshold Variation 39

55 Standard Deviation Impact in Recommendation Error 42

56 Rocchiorsquos Learning Curve 43

6 Conclusions 47

61 Future Work 48

Bibliography 49


List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 11: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

List of Tables

21 Ratings database for collaborative recommendation 10

41 Statistical characterization for the datasets used in the experiments 31

51 Baselines 37

52 Test Results 37

53 Testing features 38



List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 12: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 13: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

List of Figures

21 Popularity of different recommendation paradigms over publications in the areas of

Computer Science (CS) and Information Systems (IS) [4] 4

22 Comparing user ratings [2] 11

23 Monolithic hybridization design [2] 13

24 Parallelized hybridization design [2] 13

25 Pipelined hybridization designs [2] 13

26 Popular evaluation measures in studies about recommendation systems from the

area of Computer Science (CS) or the area of Information Systems (IS) [4] 14

27 Evaluating recommended items [2] 15

31 Recipe - ingredient breakdown and reconstruction 21

32 Normalized MAE score for recipe recommendation [22] 22

41 System Architecture 26

42 Item-to-item collaborative recommendation1 26

43 Distribution of Epicurious rating events per rating values 32

44 Distribution of Foodcom rating events per rating values 32

45 Epicurious distribution of the number of ratings per number of users 33

51 10 Fold Cross-Validation example 36

52 Lower similarity threshold variation test using Epicurious dataset 39

53 Lower similarity threshold variation test using Foodcom dataset 40

54 Upper similarity threshold variation test using Epicurious dataset 40

55 Upper similarity threshold variation test using Foodcom dataset 41

56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42

57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43

58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44

59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44

510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45




YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 14: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master



YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 15: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


YoLP - Your Lunch Pal

IF - Information Filtering

IR - Information Retrieval

VSM - Vector Space Model

TF - Term Frequency

IDF - Inverse Document Frequency

IRF - Inverse Recipe Frequency

MAE - Mean Absolute Error

RMSE - Root Mean Absolute Error

CBCF - Content-Boosted Collaborative Filtering



Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 16: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 17: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Chapter 1


Information filtering systems [1] seek to expose users to the information items that are relevant to

them Typically some type of user model is employed to filter the data Based on developments in

Information Filtering (IF) the more modern recommendation systems [2] share the same purpose

but instead of presenting all the relevant information to the user only the items that better fit the

userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated

way according to user preferences can provide users with a vastly richer experience

Recommendation systems are already very popular in e-commerce websites and on online ser-

vices related to movies music books social bookmarking and product sales in general However

new ones are appearing every day All these areas have one thing in common users want to explore

the space of options find interesting items or even discover new things

Still food recommendation is a relatively new area with few systems deployed in real settings

that focus on user preferences The study of current methods for supporting the development of

recommendation systems and how they can apply to food recommendation is a topic of great


In this work the applicability of content-based methods in personalized food recommendation is

explored To do so a recommendation system and an evaluation benchmark were developed The

study of new variations of content-based methods adapted to food recommendation is validated

with the use of performance metrics that capture the accuracy level of the predicted ratings In

order to validate the results the experimental component is directly compared with a set of baseline

methods amongst them the YoLP content-based and collaborative components

The experiments performed in this work seek new variations of content-based methods using the

well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words

in a document lead to the variation of TF-IDF developed in [3] This work presented good results in

retrieving the userrsquos favorite ingredients which raised the following question could these results be

further improved


Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 18: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Besides the validation of the content-based algorithm explored in this work other tests were

also performed The algorithmrsquos learning curve and the impact of the standard deviation in the

recommendation error were also analysed Furthermore a feature test was performed to discover

the feature combination that better characterizes the recipes providing the best recommendations

The study of this problem was supported by a scholarship at INOV in a project related to the

development of a recommendation system in the food domain The project is entitled Your Lunch

Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant

to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer

behaviour recommendations specifically adjusted to his personal taste The mobile application also

allows clients to order and pay for the items electronically To this end the recommendation system

in YoLP needs to understand the preferences of users through the analysis of food consumption data

and context to be able to provide accurate recommendations to a customer of a certain restaurant

11 Dissertation Structure

The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-

dation systems introducing various fundamental concepts and describing some of the most popular

recommendation and evaluation methods In Chapter 3 four previously proposed recommendation

approaches are analysed where interesting features in the context of personalized food recommen-

dation are highlighted In Chapter 4 the modules that compose the architecture of the developed

system are described The recommendation methods are explained in detail and the datasets are

introduced and analysed Chapter 5 contains the details and results of the experiments performed

in this work and describes the evaluation metrics used to validate the algorithms implemented in

the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work

is given and a few topics for future work are discussed



Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 19: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Chapter 2

Fundamental Concepts

In this chapter various fundamental concepts on recommendation systems are presented in order

to better understand the proposed objectives and the following Chapter of related work These

concepts include some of the most popular recommendation and evaluation methods

21 Recommendation Systems

Based on how recommendations are made recommendation systems are usually classified into the

following categories [2]

bull Knowledge-based recommendation systems

bull Content-based recommendation systems

bull Collaborative recommendation systems

bull Hybrid recommendation systems

In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach

for developing recommendation systems Collaborative methods focus more on rating-based rec-

ommendations Content-based approaches instead relate more to classical Information Retrieval

based methods and focus on keywords as content descriptors to generate recommendations Be-

cause of this content-based methods are very popular when recommending documents news arti-

cles or web pages for example

Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-

erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based

Both approaches are similar in their recommendation process the user specifies the requirements

and the system tries to identify the solution However constraint-based systems recommend items

using an explicitly defined set of recommendation rules while case-based systems use similarity


Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 20: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]

metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often

used in hybrid recommendation systems since they help to overcome certain limitations for collabo-

rative and content-based systems such as the well-known cold-start problem that is explained later

in this section

In the rest of this section some of the most popular approaches for content-based and collabo-

rative methods are described followed with a brief overview on hybrid recommendation systems

211 Content-Based Methods

Content-based recommendation methods basically consist in matching up the attributes of an ob-

ject with a user profile finally recommending the objects with the highest match The user profile

can be created implicitly using the information gathered over time from user interactions with the

system or explicitly where the profiling information comes directly from the user Content-based

recommendation systems can analyze two different types of data [5]

bull Structured Data items are described by the same set of attributes used in the user profiles

and the values that these attributes may take are known

bull Unstructured Data attributes do not have a well-known set of values Content analyzers are

usually employed to structure the information

Content-based systems are designed mostly for unstructured data in the form of free-text As

mentioned previously content needs to be analysed and the information in it needs to be trans-

lated into quantitative values so that a recommendation can be made With the Vector Space


Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 21: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Model (VSM) documents can be represented as vectors of weights associated with specific terms

or keywords Each keyword or term is considered to be an attribute and their weights relate to the

relevance associated between them and the document This simple method is an example of how

unstructured data can be approached and converted into a structured representation

There are various term weighting schemes but the Term Frequency-Inverse Document Fre-

quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name

implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows

TFij =fij


where for a document j and a keyword i fij corresponds to the number of times that i appears in j

This value is divided by the maximum fzj which corresponds to the maximum frequency observed

from all keywords z in the document j

Keywords that are present in various documents do not help in distinguishing different relevance

levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare

keywords are more relevant than frequent keywords IDF is defined as follows

IDFi = log




In the formula N is the total number of documents and ni represents the number of documents in

which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight

of a keyword i in a document j as

wij = TFij times IDFi (23)

It is important to notice that TF-IDF does not identify the context where the words are used For

example when an article contains a phrase with a negation as in this article does not talk about

recommendation systems the negative context is not recognized by TF-IDF The same applies to

the quality of the document Two documents using the same terms will have the same weights

attributed to their content even if one of them is superiorly written Only the keyword frequencies in

the document and their occurrence in other documents are taken into consideration when giving a

weight to a term

Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-

ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is

usually employed


wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 22: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

wij =TF -IDFijradicsumK

z=1(TF -IDFzj)2(24)

With keyword weights normalized to values in the [01] interval a similarity measure can be

applied when searching for similar items These can be documents a user profile or even a set

of keywords as long as they are represented as vectors containing weights for the same set of

keywords The cosine similarity metric as presented in Eq(25) is commonly used

Similarity(a b) =sum

k wkawkbradicsumk w


radicsumk w



Rocchiorsquos Algorithm

One popular extension of the vector space model for information retrieval relates to the usage of

relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates

in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-

cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos

method can also be used as a classifier for content-based filtering Documents are represented as

vectors where each component corresponds to a term usually a word The weight attributed to

each word can be computed using the TF-IDF scheme Using relevance feedback document vec-

tors of positive and negative examples are combined into a prototype vector for each class c These

prototype vectors represent the learning process in this algorithm New documents are then clas-

sified according to the similarity between the prototype vector of each class and the corresponding

document vector using for example the well-known cosine similarity metric (Eq25) The document

is then assigned to the class whose document vector has the highest similarity value

More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each

class ci being T the vocabulary composed by the set of distinct terms in the training set The weight

for each term is given by the following formula

wki = βsum



|POSi|minus γ




In the formula POSi and NEGi represent the positive and negative examples in the training set for

class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the


influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 23: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

influence of the positive and negative examples The document dj is assigned to the class ci with

the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj

Although this method has an intuitive justification it does not have any theoretic underpinnings

and there are no performance or convergence guarantees [7] In the general area of machine learn-

ing a family of online algorithms known has passive-agressive classifiers of which the perceptron

is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-

tensively [8]


Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-

chine learning methods are other examples of techniques also used to perform content-based rec-

ommendation These approaches use probabilities gathered from previously observed data in order

to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing

text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-

longing to a class c using a set of probabilities previously calculated using the observed data or

training data as it is commonly called These probabilities are

bull P (c) probability of observing a document in class c

bull P (d|c) probability of observing the document d given a class c

bull P (d) probability of observing the document d

Using these probabilities the probability P(c|d) of having a class c given a document d can be

estimated by applying the Bayes theorem

P(c|d) = P(c)P(d|c)P(d)


When performing classification each document d is assigned to the class cj with the highest





The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 24: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

The probability P(d) is usually removed from the equation as it is equal for all classes and thus

does not influence the final result Classes could simply represent for example relevant or irrelevant


In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-

mined based on individual word occurrences rather than the document as a whole This simplifica-

tion is needed due to the fact that it is very unlikely to see the exact same document more than once

Without it the observed data would not be enough to generate good probabilities Although this sim-

plification clearly violates the conditional independence assumption since terms in a document are

not theoretically independent from each other experiments show that the Naive Bayes classifier has

very good results when classifying text documents Two different models are commonly used when

working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli

event model encodes each word as a binary attribute This encoding relates to the appearance of

words in a document The second typically referred to as the multinomial event model identifies the

number of times the words appear in the document These models see the document as a vector

of values over a vocabulary V and they both lose the information about word order Empirically

the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model

especially for large vocabularies [9] This model is represented by the following equation

P(cj |di) = P(cj)prod


P(tk|cj)N(ditk) (29)

In the formula N(ditk) represents the number of times the word or term tk appeared in document di

Therefore only the words from the vocabulary V that appear in the document wisinVdi are used

Decision trees and nearest neighbor methods are other examples of important learning algo-

rithms used in content-based recommendation systems Decision tree learners build a decision tree

by recursively partitioning training data into subgroups until those subgroups contain only instances

of a single class In the case of a document the treersquos internal nodes represent labelled terms

Branches originating from them are labelled according to tests done on the weight that the term

has in the document Leaves are then labelled by categories Instead of using weights a partition

can also be formed based on the presence or absence of individual words The attribute selection

criterion for learning trees for text classification is usually the expected information gain [10]

Nearest neighbor algorithms simply store all training data in memory When classifying a new

unlabeled item the algorithm compares it to all stored items using a similarity function and then

determines the nearest neighbor or the k nearest neighbors The class label for the unclassified

item is derived from the class labels of the nearest neighbors The similarity function used by the


algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 25: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

algorithm depends on the type of data The Euclidean distance metric is often chosen when working

with structured data For items represented using the VSM cosine similarity is commonly adopted

Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback

is their inefficiency at classification time due to the fact that they do not have a training phase and all

the computation is made during the classification time

These algorithms represent some of the most important methods used in content-based recom-

mendation systems A thorough review is presented in [5 7] Despite their popularity content-based

recommendation systems have several limitations These methods are constrained to the features

explicitly associated with the recommended object and when these features cannot be parsed au-

tomatically by a computer they have to be assigned manually which is often not practical due to

limitations of resources Recommended items will also not be significantly different from anything

the user has seen before Moreover if only items that score highly against a userrsquos profile can be

recommended the similarity between them will also be very high This problem is typically referred

to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-

files the user has to rate a sufficient number of items before the content-based recommendation

system can understand the userrsquos preferences

212 Collaborative Methods

Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-

ticular user based on the items previously rated by other users This approach is also known as the

wisdom of the crowd and assumes that users who had similar tastes in the past will have similar

tastes in the future In order to better understand the usersrsquo tastes or preferences the system has

to be given item ratings either implicitly or explicitly

Collaborative methods are currently the most prominent approach to generate recommendations

and they have been widely used by large commercial websites With the existence of various algo-

rithms and variations these methods are very well understood and applicable in many domains

since the change in item characteristics does not affect the method used to perform the recom-

mendation These methods can be grouped into two general classes [11] namely memory-based

approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-

tially heuristics that make rating predictions based on the entire collection of previously rated items

by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for

user c a set of ratings S is used This set contains ratings for item p obtained from other users

who have already rated that item usually the N most similar to user c A simple example on how to

generate a prediction and the steps required to do so will now be described


Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 26: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1

Table 21 contains a set of users with five items in common between them namely Item1 to

Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-

ate a prediction The set of ratings S previously mentioned represents the ratings given by User1

User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would

give to Item5 In the simplest case the predicted rating is computed using the average of the values

contained in set S However the most common approach is to use the weighted sum where the level

of similarity between users defines the weight value to use when computing the rating For example

the rating given by the user most similar to Alice will have the highest weight when computing the

prediction The similarity measure between users is used to simplify the rating estimation procedure

[12] Two users have a high similarity value when they both rate the same group of items in an iden-

tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional

space where m represents the number of rated items in common The similarity measure results

from computing the cosine of the angle between the two vectors

Similarity(a b) =sum

sisinS rasrbsradicsumsisinS r


radicsumsisinS r



In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave

to the same item p However this measure does not take into consideration an important factor

namely the differences in rating behaviour are not considered

In Figure 22 it can be observed that Alice and User1 classified the same group of items in a

similar way The difference in rating values between the four items is practically consistent With

the cosine similarity measure these users are considered highly similar which may not always be

the case since only common items between them are contemplated In fact if Alice usually rates

items with low values we can conclude that these four items are amongst her favourites On the

other hand if User1 often gives high ratings to items these four are the ones he likes the least It

is then clear that the average ratings of each user should be analyzed in order to considerer the

differences in user behaviour The Pearson correlation coefficient is a popular measure in user-

based collaborative filtering that takes this fact into account


Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 27: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 22 Comparing user ratings [2]

sim(a b) =

sumsisinS(ras minus ra)(rbs minus rb)radicsum

sisinS(ras minus ra)2sum

sisinS(rbs minus rb)2(211)

In the formula ra and rb are the average ratings of user a and user b respectively

With the similarity values between Alice and the other users obtained using any of these two

similarity measures we can now generate a prediction using a common prediction function

pred(a p) = ra +

sumbisinN sim(a b) lowast (rbp minus rb)sum

bisinN sim(a b)(212)

In the formula pred(a p) is the prediction value to user a for item p and N is the set of users

most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos

unseen Item5 are higher or lower than the average The rating differences are combined using the

similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The

value obtained through this procedure corresponds to the predicted rating

Different recommendation systems may take different approaches in order to implement user

similarity calculations and rating estimations as efficiently as possible According to [12] one com-

mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once

in a while since the network of peers usually does not change dramatically in a short period of

time Then when a user requires a recommendation the ratings can be efficiently calculated on

demand using the precomputed similarities Many other performance-improving modifications have

been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]


The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 28: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

The techniques presented above have been traditionally used to compute similarities between

users Sarwar et al proposed using the same cosine-based and correlation-based techniques

to compute similarities between items instead latter computing ratings from them [15] Empiri-

cal evidence has been presented suggesting that item-based algorithms can provide with better

computational performance comparable or better quality results than the best available user-based

collaborative filtering algorithms [16 15]

Model-based algorithms use a collection of ratings (training data) to learn a model which is then

used to make rating predictions Probabilistic approaches estimate the probability of a certain user

c giving a particular rating to item s given the userrsquos previously rated items This estimation can be

computed for example with cluster models where like-minded users are grouped into classes The

model structure is that of a Naive Bayesian model where the number of classes and parameters of

the model are leaned from the data Other collaborative filtering methods include statistical models

linear regression Bayesian networks or various probabilistic modelling techniques amongst others

The new user problem also known as the cold start problem also occurs in collaborative meth-

ods The system must first learn the userrsquos preferences from previously rated items in order to

perform accurate recommendations Several techniques have been proposed to address this prob-

lem Most of them use the hybrid recommendation approach presented in the next section Other

techniques use strategies based on item popularity item entropy user personalization and combi-

nations of the above [12 17 18] New items also present a problem in collaborative systems Until

the new item is rated by a sufficient number of users the recommender system will not recommend

it Hybrid methods can also address this problem Data sparsity is another problem that should

be considered The number of rated items is usually very small when compared to the number of

ratings that need to be predicted User profile information like age gender and other attributes can

also be used when calculating user similarities in order to overcome the problem of rating sparsity

213 Hybrid Methods

Content-based and collaborative methods have many positive characteristics but also several limita-

tions The idea behind hybrid systems [19] is to combine two or more different elements in order to

avoid some shortcomings and even reach desirable properties not present in individual approaches

Monolithic parallel and pipelined approaches are three different hybridization designs commonly

used in hybrid recommendation systems [2]

In monolithic hybridization (Figure 23) the components of each recommendation strategy are

not architecturally separate The objective behind this design is to exploit different features or knowl-

edge sources from each strategy to generate a recommendation This design is an example of

content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be


Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 29: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 23 Monolithic hybridization design [2]

Figure 24 Parallelized hybridization design [2]

Figure 25 Pipelined hybridization designs [2]

associated with content features (eg comedies liked by user or dramas liked by user) in order to

improve the results

Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation

components have the same input as if they were working independently A weighting or a voting

scheme is then applied to obtain the recommendation Weights can be assigned manually or learned

dynamically This design can be applied for two components that perform good individually but

complement each other in different situations (eg when few ratings exist one should recommend

popular items else use collaborative methods)

Pipelined hybridization designs (Figure 25) implement a process in which several techniques

are used sequentially to generate recommendations Two types of strategies are used [2] cascade

and meta-level Cascade hybrids are based on a sequenced order of techniques in which each

succeeding recommender only refines the recommendations of its predecessor In a meta-level

hybridization design one recommender builds a model that is exploited by the principal component

to make recommendations

In the practical development of recommendation systems it is well accepted that all base algo-

rithms can be improved by being hybridized with other techniques It is important that the recom-

mendation techniques used in the hybrid system complement each otherrsquos limitations For instance


Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 30: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]

contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem

since both techniques need a database of ratings [19]

22 Evaluation Methods in Recommendation Systems

Recommendation systems can be evaluated from numerous perspectives For instance from the

business perspective many variables can and have been studied increase in number of sales

profits and item popularity are some example measures that can be applied in practice From the

platform perspective the general interactivity with the platform and click-through-rates can be anal-

ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable

feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-

ommendation systems are based on Information Retrieval (IR) measures such as Precision and


When using IR measures the recommendation is viewed as an information retrieval task where

recommended items like retrieved items are predicted to be good or relevant Items are then

classified with one of four possible states as shown on Figure 27 Correct predictions also known

as true positives (tp) occur when the recommended item is liked by the user or established as

ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by

the user that were not recommended by the system False positives (fp) designate recommended

items disliked by the user Finally correct omissions also known as true negatives (tn) represent

items correctly not recommended by the system

Precision measures the exactness of the recommendations ie the fraction of relevant items

recommended (tp) out of all recommended items (tp+ fp)


Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 31: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 27 Evaluating recommended items [2]

Precision =tp

tp+ fp(213)

Recall measures the completeness of the recommendations ie the fraction of relevant items

recommended (tp) out of all relevant recommended items (tp+ fn)

Recall =tp

tp+ fn(214)

Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are

also very popular in the evaluation of recommendation systems capturing the accuracy at the level

of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings

MAE =1



|pi minus ri| (215)

In the formula n represents the total number of items used in the calculation pi the predicted rating

for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on

larger deviations


radicradicradicradic 1



(pi minus ri)2 (216)


The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 32: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000

would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of

10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch



Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 33: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Chapter 3

Related Work

This chapter presents a brief review of four previously proposed recommendation approaches The

works described in this chapter contain interesting features to further explore in the context of per-

sonalized food recommendation using content-based methods

31 Food Preference Extraction for Personalized Cooking Recipe


Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-

ing history (ie recipes actually cooked) the system described in [3] recommends recipes that

score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite

ingredients I+k an equation based on the idea of TF-IDF is used

I+k = FFk times IRF (31)

FFk is the frequency of use (Fk) of ingredient k during a period D

FFk =Fk


The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe

Frequency IRFk which uses the total number of recipes M and the number of recipes that contain

ingredient k (Mk)

IRFk = logM



The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 34: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing

history with which the user has never cooked

To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were

used from Cookpad1 one of the most popular recipe search websites in Japan with one and half

million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was

presented each time and subjects would choose one recipe they liked to browse completely and one

recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was

exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses

were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos

favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted

by I+k was computed The F-measure is computed as follows

F-measure =2times PrecisiontimesRecallPrecision+Recall


When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient

with a precision of 833 However the recall was very low namely of only 45 The following

values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)

although the precision dropped to 607 the recall increased to 61 since the average number

of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was

recorded with the value of 608 The authors concluded that for this specific case the system

should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of

the userrsquos disliked ingredients is not explained here with more detail because the accuracy values

obtained from the evaluation where not satisfactory

In this work the recipesrsquo score is determined by whether the ingredients exist in them or not

This means that two recipes composed by the same set of ingredients have exactly the same score

even if they contain different ingredient proportions This method does not correspond to real eating

habits eg if a specific user does not like the ingredient k contained in both recipes the recipe

with higher quantity of k should have a lower score To improve this method an extension of this

work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When

performing a recommendation the system now also considered the ingredient quantity of a target


When considering ingredient proportions the impact on a recipe of 100 grams from two different



ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 35: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a

recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient

pepper is higher Therefore the scoring method proposed in this work is based on the standard

quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is

obtained as follows

σk =

radicradicradicradic 1



(gk(i) minus gk)2 (35)

In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-

tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed

average quantity of the ingredient k in all the recipes in the database) According to the deviation

score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering

the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)

Score(R) =sumkisinR

(Ik middotWk) (36)

The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite

ingredients is an interesting point to further explore in restaurant food recommendation In Chapter

4 a possible extension to this method is described in more detail

32 Content-Boosted Collaborative Recommendation

A previous article presents a framework that combines content-based and collaborative methods

[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure

content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is

also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces

significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-

ommendations was used to demonstrate this hybrid approach

In the pure content-based method the prediction task was treated as a text-categorization prob-

lem The movie content information was viewed as a document and the user ratings between 0

and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to

represent movies Each movie is represented by a set of features (eg title cast etc) where each

feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated


movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 36: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

movies ie labelled documents Finally the learned profile is used to predict the label (rating) of

unrated movies

The pure collaborative filtering method implemented in this study uses a neighborhood-based

algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-

diction is computed with the weighted average of deviations from the neighborrsquos mean Both these

methods were explained in more detail in Section 212

The naive hybrid approach uses the average of the ratings generated by the pure content-based

predictor and the pure collaborative method to generate predictions

CBCF basically consists in performing a collaborative recommendation with less data sparsity

This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The

pseudo user-ratings vector consists of the item ratings provided by the user u where available and

those predicted by the content-based method otherwise

vui =

rui if user u rated item i

cui otherwise(37)

Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created

The similarity between users is then computed with the Pearson correlation coefficient The accuracy

of a pseudo user-ratings vector depends on the number of movies the user has rated If the user

has rated many items the content-based predictions are significantly better than if he has only a

few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the

number of items the user has rated Lastly the prediction is computed using a hybrid correlation

weight that allows similar users with more accurate pseudo vectors to have a higher impact on the

predicted rating The hybrid correlation weight is explained in more detail in [20]

The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The

content-boosted collaborative filtering system presented the best results with a MAE of 0962

The pure collaborative filtering and content-based methods presented MAE measures of 1002 and

1059 respectively The MAE value of the naive hybrid approach was of 1011

CBCF is an important approach to consider when looking to overcome the individual limitations

of collaborative filtering and content-based methods since it has been shown to perform consistently

better than pure collaborative filtering


Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 37: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 31 Recipe - ingredient breakdown and reconstruction

33 Recommending Food Reasoning on Recipes and Ingredi-


A previous article has studied the applicability of recommender techniques in the food and diet

domain [22] The performance of collaborative filtering content-based and hybrid recommender al-

gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus

of this article is the content or ingredients of a meal various other variables that impact a userrsquos

opinion in food recommendation are mentioned These other variables include cooking methods

ingredient costs and quantities preparation time and ingredient combination effects amongst oth-

ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is

simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur

As a baseline algorithm random recommender was implemented which assigns a randomly

generated prediction score to a recipe Five different recommendation strategies were developed for

personalized recipe recommendations

The first is a standard collaborative filtering algorithm assigning predictions to recipes based

on the weighted ratings of a set of N neighbors The second is a content-based algorithm which

breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores

Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe

Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a

simple pipelined hybrid design where the content-based approach provides predictions to missing

ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then

used by the collaborative approach to generate recommendations These strategies differentiate

from one another by the approach used to compute user similarity The hybrid recipe method iden-

tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity

is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy

was implemented In this strategy only the positive ratings for items that receive mixed ratings are

considered It is considered that common items in recipes with mixed ratings are not the cause of the

high variation in score The results of the study are represented in Figure 32 using the normalized


Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 38: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 32 Normalized MAE score for recipe recommendation [22]

MAE as an evaluation metric

This work shows that the content-based approach in this case has the best overall performance

with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the

authors concluded that this work implemented a simplistic version of what a recipe recommender

needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating

that can be considered to improve content-based food recommendations

34 User Modeling for Adaptive News Access

Similarity is an important subject in many recommendation methods Still similar items are not

the only ones that matter when calculating a prediction In some cases items that are too similar

to others which have already been seen are not to be recommended as well This idea is used

in Daily-Learner [23] a well-known news article content-based recommendation system When

helping the user to obtain more knowledge about a news topic a certain variety should exist when

performing the recommendation Items too similar to others known by the user probably carry the

same information and will not help him to gather more information about a particular news topic

These items are then excluded from the recommendation On the other hand items similar in topic

but not similar in content should be great recommendations in the context of this system Therefore

the use of similarity can be adjusted according to the objectives of the recommendation system

In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm

to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor

algorithms simply store all training data in memory When classifying a new unlabeled item the

algorithm compares it to all stored items using a similarity function and then determines the nearest

neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a

userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single

story of a new topic is needed to allow the algorithm to identify future follow-up stories


Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 39: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is

used to quantify the similarity between two vectors When computing a prediction for a new story all

the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to

be classified become voting stories The predicted score is then computed as the weighted average

over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than

a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as

known because the system assumes that the user is already aware of the event reported in it and

does not need to recommend a story he already knows If the story does not have any voters it

cannot be classified by the short-term model and is passed to the long-term model explained with

more detail in [23]

This issue should be taken into consideration in food recommendations as usually users are not

interested in recommendations with contents too similar to dishes recently eaten



Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 40: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 41: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Chapter 4


In this chapter the modules that compose the architecture of the recommendation system are pre-

sented First an introduction to the recommendation module is made followed by the specification

of the methods used in the different recommendation components Afterwards the datasets chosen

to validate this work are analyzed and the database platform is described

The recommendation system contains three recommendation components (Fig 41) the YoLP

collaborative recommender the YoLP content-based recommender and an experimental recommen-

dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-

sonalized food recommendations These provide independent recommendations for the same input

in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the

experimental component The evaluation module independently evaluates each recommendation

component by measuring the performance of the algorithms using different metrics The methods

used in this module are explained in detail in the following chapter The programming language used

to develop these components was Python1

41 YoLP Collaborative Recommendation Component

The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-

rative approach [24] This approach is very similar to the user-to-user approach explained in detail

in Section 212

In user-to-user the similarity value between a pair of users is measured by the way both users

rate the same set of items where in item-to-item approach the similarity value between a pair items

is measured from the way they are rated by a shared set of users In other words in user-to-user

two users are considered similar if they rate the same set of items in a similar way where in the

item-to-item approach two items are considered similar if they were rated in a similar way by the



Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 42: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 41 System Architecture

Figure 42 Item-to-item collaborative recommendation2

same group of users

The usual formula for computing item-to-item similarity is the Person correlation defined as

sim(a b) =

sumpisinP (rap minus ra)(rbp minus rb)radicsum

pisinP (rap minus ra)2sum

pisinP (rbp minus rb)2(41)

where a and b are recipes rap is the rating from user p to recipe a P is the group of users

that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings


After the similarity is computed the rating prediction is calculated using the following equation

pred(u a) =sum

bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)




In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 43: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

In the formula pred(u a) is the prediction value to user u for item a and N is the set of items

rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted

according to the similarity between b and the target item a The predicted rating is computed by the

sum of similarities

Item based approach was chosen in the YoLP collaborative recommendation component be-

cause it is computationally more efficient when recommending a fixed group of recipes Recom-

mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler

to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute

the predicted ratings from there Another reason why the item based collaborative approach was

chosen was already mentioned in Section 212 empirical evidence has been presented suggest-

ing that item-based algorithms can provide with better computational performance comparable or

better quality results than the best available user-based collaborative filtering algorithms [16 15]

42 YoLP Content-Based Recommendation Component

The YoLP content-based component generates recommendations by comparing the restaurantrsquos

recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-

mended recipes are ordered from most to least similar In this case instead of referring recipes as

vectors of words recipes are represented by vectors of different features The features that compose

a recipe are category region restaurant ID and ingredients Context features are also considered

in the moment of the recommendation these are temperature period of the day and season of the

year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors

The user profile is composed by binary values of the recipe features that he positively rated ie

when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values

to the profile vector

YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom

datasets to validate the algorithms a rating value is needed In collaborative recommendation the

list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated

However in the content-based method the recipes are ordered by the similarity values between the

recipe feature vector and the user profile vector In order to transform the similarity measure into a

rating the combined user and item average was used The formula applied was the following

Rating =

avgTotal + 05 if similarity gt 08

avgTotal otherwise(43)


Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 44: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Where avgTotal represents the combined user and item average for each recommendation So it

is important to notice that the test results presented in chapter 5 for the YoLP content-based method

are an approximation to the real values since it is likely that this method of transforming a similarity

measure into a rating introduces a small error in the results Another approximation is the fact that

YoLP considers context features in the moment of the recommendation and these are not included

in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44

43 Experimental Recommendation Component

This component represents the main focus of this work It implements variations of content-based

methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe

as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This

work presented good results in retrieving the userrsquos favourite ingredients which raised the following

question could these results be further improved As previously mentioned the TF-IDF scheme

can be used to attribute weights to words when using the popular Rocchio algorithm Instead of

simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall

preference in ingredients could be estimated through the prototype vector which represents the

learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the

positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this

section the method used to compute the features weights to be used in the Rocchiorsquos algorithm

is presented Next two different approaches are introduced to build the usersrsquo prototype vectors

and lastly the problem of transforming a similarity measure into a rating value is presented and the

solutions explored in this work are detailed

431 Rocchiorsquos Algorithm using FF-IRF

As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation

Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore

in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype

vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the

userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features

and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to

be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does

not allow to determine the number of times that a feature is preferred during a period D The Inverse

Recipe Frequency is used exactly as mentioned in [3]


IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 45: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

IRFk = logM


Where M is the total number of recipes and Mk is the number of recipes that contain ingredient

k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the

complete dataset

432 Building the Usersrsquo Prototype Vector

The prototype vector is built directly from the userrsquos rated items The type of observation positive

or negative and the weight attributed to each determines the impact that a rated recipe has on the

user prototype vector In the experiments performed in this work positive and negative observations

have an equal weight of 1 In order to determine if a rating event is considered a positive or negative

observation two different approaches were studied The first approach was simple the lower rating

values are considered a negative observation and the higher rating values are positive observations

In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations

and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same

process is applied to this dataset with the exception of ratings equal to 3 in this case these are

considered neutral observations and are ignored Both datasets used in the experiments will be

explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating

value computed from the training set If a rating event is lower then the userrsquos average rating it is

considered a negative observation and if it is equal or higher it is considered a positive observation

As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-

ences These are directly obtained from the rating events contained in the training set Depending

on the observation the recipersquos features weights are added or subtracted on the user prototype vec-

tor In positive observations the recipersquos features weights determined by the IRF value are added

to the vector In negative observations the features weights are subtracted

433 Generating a rating value from a similarity value

Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find

Epicurious and Foodcom which will be presented in the next section are food related datasets

with relevant information on the recipes that contain rating events from users to recipes In order to

validate the methods explored in this work the recommendation system also needs to return a rating

value This problem was already mentioned when YoLP content-based component was presented

Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile


vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 46: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

vector so a method is needed to translate the similarity into a rating This topic is very important to

explore since it can introduce considerate errors in the validation results Next two approaches are

presented to translate the similarity value into a rating

Min-Max method

The similarity values needed to fit into a specific range of rating values There are many types of

normalization methods available the technique chosen for this work was Min-Max Normalization

Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3

B =Aminusminimum value ofA

maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)

In order the obtain the best results the similarity and rating scales were computed individually

for each user since not all users rate items the same way or have the same notion of high or low

rating values So the following steps were applied compute all the usersrsquo similarity variation from

the validation set and compute all the usersrsquo rating variation from the training set At this point

the similarity scale is mapped for each user into the rating range and the Min-Max Normalization

formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases

where there were not enough user ratings to compute the similarity interval (maximum value of A

ndash minimum value of A) the user average was used as default for the recommendation

Using average and standard deviation values from training set

Using the average and standard deviation values from the training set should in theory bring good

results and introduce only a very small error To generate a rating value following formula was used

Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L


Three different approaches were tested using the userrsquos rating average and the user standard

deviation using the recipersquos rating average and the recipe standard deviation and using the com-

bined average of the user and the recipe averages and standard deviations

This approach is very intuitive when the similarity value between the recipersquos features and the



Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 47: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Table 41 Statistical characterization for the datasets used in the experiments

Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061

user profile is high then the recipersquos features are similar to the userrsquos preferences which should

yield a higher rating value to the recipe Since the notion of a high rating value varies between

users and recipes their averages and standard deviation can help determine with more accuracy

the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity

thresholds values used in this method U and L respectively will be optimized to obtain the best

recommendation performances but initially the upper threshold U is 075 and the lower threshold L


44 Database and Datasets

The database represented in the system architecture stores all the data required by the recommen-

dation system in order to generate recommendations The data for the experiments is provided by

two datasets The first dataset was previously made available by [25] collected from a large on-

line4 recipe sharing community The second dataset is composed by crawled data obtained from a

website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated

recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated

no more than 3 times were removed as well as the users who rate no more than 5 times In table

41 a statistical characterization for the two datasets is presented after the filter was applied

Both datasets contain user reviews for specific recipes where each recipe is characterized by the

following features ingredients cuisine and dietary Here are some examples of these features

bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry

Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop

Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc

bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-

tralSouth American European Mexican Latin American American Greek Indian German

Italian etc



Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 48: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 43 Distribution of Epicurious rating events per rating values

Figure 44 Distribution of Foodcom rating events per rating values

bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo

Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc

A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to

it The main difference between the recipersquos features in these datasets is the way that ingredients are

represented In Foodcom recipes are characterized by all the ingredients that compose it where in

Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen

by the web site users when performing a review

In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure

43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45

shows the distribution of the number of users per number of rated items for the Epicurious dataset

This last graph is not presented for the Foodcom dataset because its curve would be very similar

since a decrease in the number of users when the number of rated items increases is a normal

characteristic of rating event datasets


Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 49: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 45 Epicurious distribution of the number of ratings per number of users

The database used to store the data is MySQL6 Being a relational database MySQL is excel-

lent for representing and working with structured sets of data which is perfectly adequate for the

objectives of this work The database stores all rating events recipe features (ingredients cuisines

and dietaries) and the usersrsquo prototype vectors




Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 50: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 51: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Chapter 5


This chapter contains the details and results of the experiments performed in this work First the

evaluation method and evaluation metrics are presented followed by the discussion of the first ex-

perimental results and baselines algorithms In Section 53 a feature test is performed to determine

the features that are crucial for the best recommendations In Section 54 a threshold variation test

is performed to adjust the algorithm and seek improvements in the recommendation results Finally

the last two sections focus on analysing two interesting topics of the recommendation process using

the algorithm that showed the best results

51 Evaluation Metrics and Cross Validation

Cross-validation was used to validate the recommendation components in this work This technique

is mainly used in systems that seek to estimate how accurately a predictive model will perform in

practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead

of using it to train the model this segment is used to evaluate the predictions made by the system

during the training phase This procedure provides an insight on how the model will generalize to an

independent dataset More specifically leave-p-out cross-validation method was used leveraging

p observations as the validation set and the remaining observations as the training set To reduce

variability this process is repeated multiple times using different observations p as the validation set

Ideally this process is repeated until all possible combinations of p are tested The validation results

are averaged over the number of times the process is repeated (see Fig 51) In the experiments

performed in this work the chosen value for p was 5 so the process is repeated 5 times also known

as 5-fold cross-validation For each fold the validation set represents 20 and the training set the

remaining 80 of the data

Accuracy is measured when comparing the known data from the validation set with the outputs of

the system (ie the prediction values) In the simplest case the validation set presents information


Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 52: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 51 10 Fold Cross-Validation example

in the following format

bull User identification userID

bull Item identification itemID

bull Rating attributed by the userID to the itemID rating

By providing the recommendation system with the userID and itemID as inputs the algorithms

generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-

ously rated items learned from the training set

Using the correct rating values obtained from the validation set and the generated predictions

created by the algorithms the MAE and RMSE measures can be computed As previously men-

tioned in Section 22 these measures compute the deviation between the predicted ratings and the

actual ratings The results obtained from the evaluation module are used to directly compare the

performance of the different recommendation components as well as to validate new variations of

context-based algorithms

52 Baselines and First Results

In order to validate the experimental context-based algorithms explored in this work first some base-

lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components

presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a

few simple baselines metrics were also computed using the direct values of specific dataset aver-

ages as the predicted rating for the recommendations The averages computed were the following

user average rating recipe average rating and the combined average of the user and item aver-

ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as


Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 53: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Table 51 Baselines

Epicurious FoodcomMAE RMSE MAE RMSE

YoLP Content-basedcomponent

06389 08279 03590 06536

YoLP Collaborativecomponent

06454 08678 03761 06834

User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250

Table 52 Test Results

Epicurious FoodcomObservationUser Average

ObservationFixed Thresh-old

User AverageObservation

ObservationFixed Thresh-old

MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation

08217 10606 07759 10283 04448 06812 04287 06624

Item Avg + Item Stan-dard Deviation

08914 11550 08388 11106 04561 07251 04507 07207

UserItem Avg + Userand Item Standard De-viation

08304 10296 07824 09927 04390 06506 04324 06449

Min-Max 08539 11533 07721 10705 06648 09847 06303 09384

inputs the recommendation system simply returns the userID average or the recipeID average or

the combination of both Table 51 contains the MAE and RMSE values for the baseline methods

As detailed in Section 43 the experimental recommendation component uses the well-known

Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building

the userrsquos prototype vectors were presented using the user average rating value as threshold for

positive and negative observations or simply using a fixed threshold in the middle of the rating

range considering as positive observations the highest rating values and as negative the lowest

These are referred in Table 52 as Observation User Average and Observation Fixed Threshold

Also detailed in Section 43 a few different methods are used to convert the similarity value returned

from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of

the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard

Deviation UserItem Avg + User and Item Standard Deviation and Min-Max

Table 52 contains the first test results of the experiments using 5-fold cross-validation The

objective was to determine which method combination had the best performance so it could be

further adjusted and improved When observing the MAE and RMSE values it is clear that using the

user average as threshold to build the prototype vectors results in higher error values than the fixed

threshold of 3 to separate the positive and negative observations The second conclusion that can

be made from these results is that using the combination of both user and item average ratings and

standard deviations has the overall lowest error values

Although the first results do not surpass most of the baselines in terms of performance the

experimental methods with the best performances were identified and can now be further improved


Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 54: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Table 53 Testing features

Epicurious FoodcomMAE RMSE MAE RMSE

Ingredients + Cuisine +Dietaries

07824 09927 04324 06449

Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320

and adjusted to return the best recommendations

53 Feature Testing

As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine

and dietary In content-based methods it is important to determine if all features are helping to obtain

the best recommendations so feature testing is crucial

In the previous Section we concluded that the method combination that performed the best was

the following

bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations

and build the prototype vectors

bull Use the combination of both user and item average ratings and standard deviations to trans-

form the similarity value into a rating value

From this point on all the experiments performed use this method combination

Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially

for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the

prototype vectors for each feature combination to be tested so when computing the user prototype

vector the features where separated and in practice 3 vectors were created and stored for each

user This representation makes feature testing very easy to perform For each recommendation

when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features

the composition of the prototype vector can be controlled as the 3 stored vectors can be easily

merged In the tests presented in the previous section the prototype vector was built using all

features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective

line of Table 53

Using more features to describe the items in content-based methods should in theory improve

the recommendations since we have more information available about them and although this is

confirmed in this test see Table 53 that may not always be the case Some features like for


Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 55: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 52 Lower similarity threshold variation test using Epicurious dataset

example the price of the meal can increase the correlation between the user preferences and items

he dislikes so it is important to test the impact of every new feature before implementing it in the

recommendation system

54 Similarity Threshold Variation

Eq 46 previously presented in Section 433 and repeated here for convenience was used in the

first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating


Rating =

average rating + standard deviation if similarity gt= U

average rating if L lt= similarity lt U

average rating minus standard deviation if similarity lt L

The initial values for the thresholds 075 for U and 025 for L were good starting values to test

this method but now other cases need to be tested By fluctuating the case limits the objective of

this test is to study the impact in the recommendation and discover the similarity case thresholds

that return the lowest error values

Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and

Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]


Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 56: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 53 Lower similarity threshold variation test using Foodcom dataset

Figure 54 Upper similarity threshold variation test using Epicurious dataset

From this test it is clear that the lower threshold only has a negative effect in the recommendation

accuracy and subtracting the standard deviation does not help The accentuated drop in error value

seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)

is completely removed

As a result of these tests Eq 46 was updated to


Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 57: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 55 Upper similarity threshold variation test using Foodcom dataset

Rating =

average rating + standard deviation if similarity gt= U

average rating if similarity lt U


Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test

results for theEpicurious and Foodcom datasets respectively For each similarity value represented

by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests

multiple times on the experimental recommendation component adjusting the upper similarity value

between each test

The results obtained were interesting As mentioned in Section 22 MAE computes the devia-

tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more

emphasis on higher deviations These definitions help to understand the results of this test Both

datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE

decreases and the RMSE increases When lowering the similarity threshold the recommendation

system predicts the correct rating value more times which results in a lower average error so the

MAE is lower But although it is predicting the exact rating value more times in the cases where it

misses the deviation between the predicted rating and the actual rating is higher and since RMSE

places more emphasis on higher deviations the RMSE values increase The best similarity threshold

is subjective some systems may benefit more from a higher rate of exact predictions while in others

a lower deviation between the predicted ratings and the actual ratings is more suitable In this test


Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 58: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset

the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE

With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230

Compared directly with the YoLP Content-based component which obtained the overall lowest

error rates from all the baselines the experimental recommendation component showed better re-

sults when using the Foodcom dataset

55 Standard Deviation Impact in Recommendation Error

When recommending items using predicted ratings the user standard deviation plays an important

role in the recommendation error Users with the standard deviation equal to zero ie users that

attributed the same rating to all their reviews should have the lowest impact in the recommendation

error The objective of this test is to discover how significant is the impact of this variable and if the

absolute error does not spike for users with higher standard deviations

In the Figures 56 and 57 each point represents a user the point on the graph is positioned

according to the userrsquos absolute error and standard deviation values The line in these two graphs

indicates the average value of the points in that proximity

Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected

since it is normal for the absolute error to slowly increase for users with higher standard deviations

It would not be good if a spike in the absolute error was noted towards the higher values of the

standard deviation which would imply that the recommendation algorithm was having a very small

impact in the predicted ratings Having in consideration the small dimensionality of this dataset and

the lighter density of points in the graph towards the higher values of standard deviation probably


Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 59: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset

there was not enough data on users with high deviation for the absolute error to stagnate

Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate

for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos

preferences and returning good recommendations even to users with high standard deviations

56 Rocchiorsquos Learning Curve

The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences

and the recipe features Since the userrsquos preferences in a real system are built over time the objec-

tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in

this work and analyse if the recommendation error starts to converge after a determined amount of

reviews are made In order to perform this test first the datasets were analysed to find a group of

users with enough recipes rated to study the improvements in the recommendation The Epicurious

dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest

chosen threshold for this dataset in order to maintain a considerable amount of users to average the

recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100

recipes and since the results of this experiment showed a consistent drop in the errors measured

as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen

in Fig 510

The training set represents the recipes that are used to build the usersrsquo prototype vectors so for

each round an additional review is added to the training set and removed from the validation set

in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in


Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 60: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes

Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes

error and after 25 recipes rated the error fluctuates around the same values Although it would be

interesting to perform this experiment with a higher number of rated recipes too see the progress

of the recommendation error due to the small dimension of the dataset there are not enough users

with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant

improvement in the recommendation although there is not a clear number of rated recipes that marks


Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 61: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes

a threshold where the recommendation error stagnates



Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 62: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 63: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

Chapter 6


In this MSc dissertation the applicability of content-based methods in personalized food recom-

mendation was explored Using the well-known Rocchio algorithm several approaches were tested

to further explore the breaking of recipes down into ingredients presented in [22] and use more

variables related to personalized food recommendation

Recipes were represented as vectors of features weights determined by their Inverse Recipe

Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-

ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value

returned by the algorithm into a rating value needed to compute the performance of the recommen-

dation system When building the prototype vectors the approach that returned the best results

used a fixed threshold to differentiate positive and negative observations The combination of both

user and item average ratings and standard deviations demonstrated the best results to transform

the similarity value into a rating value These approaches combined returned the best performance

values of the experimental recommendation component

After determining the best approach to adapt the Rocchio algorithm to food recommendations

the similarity threshold test was performed to adjust the algorithm and seek improvements in the

recommendation results The final results of the experimental component showed improvements in

the recommendation performance when using the Foodcom dataset With the Epicurious dataset

some baselines like the content-based method implemented in YoLP registered lower error values

Being two datasets with very different characteristics not improving the baseline results in both

was not completely unexpected In the Epicurious dataset the recipe ingredient information only

contained its main ingredients which were chosen by the user in the moment of the review opposed

to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of

detail both in the recipes and in the prototype vectors and adding the major difference in the dataset

sizes these could be some of the reasons why the difference in performance was observed

The datasets used in this work were the only ones found that better suited the objective of the


experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 64: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

experiments ie that contained user reviews allowing to validate the studied approaches Since

there are very few studies related to food recommendations the features that better describe the

recipes are still undefined The feature study performed in this work which explored all the features

available in both datasets ingredients cuisines and dietaries shows that the use of all features

combined outperforms every feature individually or other pairwise combinations

61 Future Work

Implementing a content-boosted collaborative filtering system using the content-based method ex-

plored in this work would be an interesting experiment for a future work As mentioned in Section

32 by implementing this hybrid approach a performance increase of 92 as measured with the

MAE metric was obtained when compared to a pure content-based method [20] This experiment

would determine if a similar decrease in the MAE could be achieved by implementing this hybrid

approach in the food recommendation domain

The experimental component can be configured to include more variables in the recommendation

process for example season of the year (ie winterfall or summerspring) time of the day (ie

lunch or dinner) total meal cost total calories amongst others The study of the impact that these

features have on the recommendation is also another interesting point to approach in the future

when datasets with more information are available

Instead of representing users as classes in Rocchio a set of class vectors created for each

user could represent their preferences From the user rated recipes each class would contain the

features weights related with a specific rating value observation When recommending a recipe its

feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the

vector with the highest similarity represents the class where the recipe fits the best

Using this method removes the need to transform the similarity measure into a rating since the

class with the highest similarity to the targeted recipe would automatically attribute it a predicted




[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 65: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master


[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and

systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868

doi 101023A1011196000674

[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction

volume 40 Cambridge University Press 2010 ISBN 9780521493369

[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized

cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash

105 2011

[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer

Science and Information Systems - A Landscape of Research In E-Commerce and Web

Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3

doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007


[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US

Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http


[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8

URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf

[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web

4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink


[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive

Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl



[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 66: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification

In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN

0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload


[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-

egorization In Proceedings of the Fourteenth International Conference on Machine

Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics

bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011



[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for

collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-

telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172

x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+


[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A

Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and

Data Engineering 17(6)734ndash749 2005

[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender

Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-

uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle


[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms

In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403


[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-

mendation algorithms In Proceedings of the 10th International Conference on World Wide

Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http


[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting

to Know You Learning New User Preferences in Recommender Systems In Proceedings


of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography
Page 67: Personalized Food Recommendations - ULisboa€¦ · Personalized Food Recommendations Exploring Content-Based Methods Jorge Miguel Tavares Soares de Almeida Thesis to obtain the Master

of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN

1581134592 doi 101145502716502737

[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-

orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004

ISSN 10414347 doi 101109TKDE20041264822

[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-

Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012


[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-

proved recommendations In Proceedings of the Eighteenth National Conference on Artificial

Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936

[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by

Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings

of the International MultiConference of Engineers and Computer Scientists pages 519ndash523

2014 ISBN 9789881925251

[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-

ents In Proceedings of the 18th International Conference on User Modeling Adaptation

and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi

101007978-3-642-13470-8 36

[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling

and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A


[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-

actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770


[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-

ommendation volume 8444 2014

[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-

lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN

10450823 doi 101067mod2000109031


  • Acknowledgments
  • Resumo
  • Abstract
  • List of Tables
  • List of Figures
  • Acronyms
  • 1 Introduction
    • 11 Dissertation Structure
      • 2 Fundamental Concepts
        • 21 Recommendation Systems
          • 211 Content-Based Methods
          • 212 Collaborative Methods
          • 213 Hybrid Methods
            • 22 Evaluation Methods in Recommendation Systems
              • 3 Related Work
                • 31 Food Preference Extraction for Personalized Cooking Recipe Recommendation
                • 32 Content-Boosted Collaborative Recommendation
                • 33 Recommending Food Reasoning on Recipes and Ingredients
                • 34 User Modeling for Adaptive News Access
                  • 4 Architecture
                    • 41 YoLP Collaborative Recommendation Component
                    • 42 YoLP Content-Based Recommendation Component
                    • 43 Experimental Recommendation Component
                      • 431 Rocchios Algorithm using FF-IRF
                      • 432 Building the Users Prototype Vector
                      • 433 Generating a rating value from a similarity value
                        • 44 Database and Datasets
                          • 5 Validation
                            • 51 Evaluation Metrics and Cross Validation
                            • 52 Baselines and First Results
                            • 53 Feature Testing
                            • 54 Similarity Threshold Variation
                            • 55 Standard Deviation Impact in Recommendation Error
                            • 56 Rocchios Learning Curve
                              • 6 Conclusions
                                • 61 Future Work
                                  • Bibliography