semanticsvd++: incorporating semantic taste evolution for predicting ratings

15
SemanticSVD ++ : INCORPORATING SEMANTIC TASTE EVOLUTION FOR PREDICTING RATINGS MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | [email protected] International Conference on Web Intelligence 2014 Warsaw, Poland

Upload: matthew-rowe

Post on 29-Nov-2014

924 views

Category:

Data & Analytics


0 download

DESCRIPTION

Presentation slides from the International Conference on Web Intelligence 2014.

TRANSCRIPT

Page 1: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: INCORPORATING SEMANTIC TASTE EVOLUTION FOR PREDICTING RATINGS MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | [email protected] International Conference on Web Intelligence 2014 Warsaw, Poland

Page 2: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 1

1 2 3

1 4* 4* 2*

2 5* ? 1*

3 5* 4* 1*

1 2 3

1 4* 4* 2*

2 5* 4* 1*

3 5* 4* 1* Induce Model and

Predict Ratings

Predicting Ratings

Page 3: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 2

1 … f

1

2

3

1 2 3

1 4* 4* 2*

2 5* ? 1*

3 5* 4* 1*

1 2 3

1

f

Latent Factor Models: Factor Consistency Problem

•  Cannot ‘accurately’ align latent factors •  Cannot tell how users’ taste have evolved

F = #factors (a priori)

Time

?

?

?

?

Page 4: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 3

1 … c

1

2

3

1 2 3

1 4* 4* 2*

2 5* ? 1*

3 5* 4* 1*

i <URI> {<SKOS_CATEGORY>}

Solution: Semantic Categories

Preference for category c at time s

√" √"

Time

Page 5: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 4

Semantic Alignment of Datasets

SPARQL Query for Candidate

URIs from Movie’s title

Get Semantic Categories of

each candidate

Disambiguate based on

Movie’s Year For each movie item

{(ItemID,<URI>)} explicit factors. The web of linked data provided a resourcefor such information, where movies appear within the linkeddata cloud as Uniform Resource Identifiers (URIs) which, upondereferencing, return information about the movie: director,year of release, actors, and the semantic categories in whichthe film has been placed. For instance, for the movie ‘Alien’released in 1979, which we shall now use as a runningexample, the following categories are found:<h t t p : / / d b p e d i a . o rg / r e s o u r c e / A l i e n s ( f i l m )>

d c t e r m s : s u b j e c t c a t e g o r y : A l i en ( f r a n c h i s e ) f i l m s ;d c t e r m s : s u b j e c t c a t e g o r y :1986 h o r r o r f i l m s .

In this work we use DBPedia URIs, given their relationto semantic (SKOS) categories. In order to provide suchinformation, however, we require a link between a given itemwithin one of our recommendation datasets and the URI thatdenotes that movie item, prompting the question: How canitems be aligned to their semantic web URIs? Our method forsemantic URI alignment functioned as follows: first, we usedthe SPARQL query from [8] to extract all films (instancesof dbpedia-owl:Film) from DBPedia which contained ayear within one of their categories:4

SELECT DISTINCT ? movie ? t i t l e WHERE {? movie r d f : t y p e dbped ia�owl : Fi lm ;

r d f s : l a b e l ? t i t l e ;d c t e r m s : s u b j e c t ? c a t .

? c a t r d f s : l a b e l ? y e a r .FILTER langMatches ( l a n g ( ? t i t l e ) , ”EN” ) .FILTER r e g e x ( ? year , ”ˆ[0�9]{4} f i l m ” , ” i ” )

}

Using the extracted mapping between the movie URI(?movie) and title (?title) we then identified the setof candidate URIs (C) based on performing fuzzy matchesbetween a given item’s title and the extracted title fromDBPedia. Fuzzy matches were performed using the Leven-shtein similarity metric (derived from the normalised reciprocalLevenshtein distance) and setting the similarity threshold to0.9. We use fuzzy matches here due to the different formsof the movie titles and abbreviations within the datasets andlinked data labels. After deriving the set of candidate URIs,we then dereferenced each URI and looked up its year tosee if it appears within an associated category (i.e. ?moviedcterms:subject ?category). If the year of the movieitem appears within a mapped category (?category) thenwe identified the given semantic URI as denoting the item.This disambiguation was needed here as multiple films canshare the same title - this often happens with film remakes.This approach achieved coverage (i.e. proportion of itemsmapped) of 83% and 69% for MovieLens and MovieTweetingsrespectively - this reduced coverage for MovieTweetings isexplained by the recency of the movies being reviewed andthe lack of coverage of this on DBPedia at present.

B. Reduced Datasets and the Hipster Dilemma

Based on our alignment of the movie items within eachrespective dataset with their semantic URIs within the webof linked data we derived new reduced datasets - as not allitems can be mapped to URIs. We then further reduced thedataset based on two conditions by only considering users

4We used a local copy of DBPedia 3.9 for our experiments:http://wiki.dbpedia.org/Downloads39

who: (i) have posted ratings within the training and test sets;and (ii) have posted at least 10 ratings within the trainingand validation segment. The reason for the first condition isthat in this work we do not consider cold start situations,in the discussions section of the paper we explain how thework can be extended to consider such a scenario however.The reason for the second condition is that we need sufficientinformation to understand how a given user’s tastes haveevolved semantically, with only a few posts to go on we arelimited in doing this - we expand on this reasoning in thefollowing section. Table I demonstrates the extent to whichthe items, users and ratings have been reduced. There is alarge reduction for MovieTweetings where the nature of ourapproach means that we only consider 11% of users withinthe dataset. We also note that the reduction in the number ofratings is not as great, this suggests two things: (i) mappeditems are popular, and thus dominate the ratings; and (ii)obscure items are present within the data.

TABLE I. STATISTICS OF THE REVISED REVIEW DATASETS USED FOROUR ANALYSIS AND EXPERIMENTS. REDUCTION OVER THE ORIGINAL

DATASETS ARE SHOWN IN PARENTHESES.

Dataset #Users #Items #RatingsMovieLens 5,390 (-11%) 3,231 (-12.1%) 841,602 (-6.7%)MovieTweetings 2,357 (-89%) 7,913 (-30.8%) 73,397 (-38.2%)Total 7,747 11,144 914,999

As Table I suggests, certain more ‘obscure’ movies donot have DBPedia URIs; despite our use of the most recentDBPedia datasets (i.e. version 3.9) coverage is still limited incertain places. The reason for this lack of coverage for certainitems is largely due to the obscurity of the film not having awikipedia page. For instance, for the MovieLens dataset we failto map the three movies ‘Never Met Picasso’, ‘Diebinnen’ and‘Follow the Bitch’, despite these films having IMDB pages theyhave no wikipedia page, and hence no DBPedia entry. For theMovie Tweetings dataset we fail to map ‘Summer Coda’ and‘The Potted Psalm’, both of which, again, have IMDB pagesbut no wikipedia page. We define the challenge of addressingthe obscurity of movies, and thus their lack of a DBPedia entry,as the hipster dilemma. In the discussions section of the paperwe outline several strategies for dealing with this in the futuregiven its presence not only in this work but in the prior workof Ostuni et al. [8], and its potential effect on future workwithin this domain.

IV. SEMANTIC TASTE EVOLUTION

Now that we have a mapping between the items within ourmovie recommendation datasets and their URIs on the web oflinked data, we can examine how users have rated semanticcategories in the past and how their tastes have evolved, at thissemantic level, over time. From this point onwards we reservethe following special characters for set notations, as follows:

• u, v denote users, and i, j denote items.

• r denotes a known rating value (where r 2 [1, 5] orr 2 [1, 10]), and r denotes a predicted rating value.

• Datasets are provided as quadruples of the form(u, i, r, t) 2 D, where t denotes the time of the rating,and are segmented into training (D

train

), validation

Page 6: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

Semantic alignment = fewer elements

Time-ordered datasets split for experiments: •  80%/10%/10% for training/validation/testing

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 5

Reduced Recommendation Datasets

May Jul Sep Nov JanTime

Num

ber o

f Rev

iew

s0

40,0

0070

,000

(a) MovieLens

Mar May Jul SepTime

Num

ber o

f Rev

iew

s0

400

800

1,20

0

(b) MovieTweetings

Fig. 1. Distribution of reviews per day across the MovieLens and Movi-eTweetings datasets. The first dashed blue line indicates the cutoff point forthe training set, and the dashed red line indicates the cutoff point for the testset - i.e. every rating after that point is placed in the test set. The validationset contains the ratings between the blue and red dashed lines.

released in 1979, which we shall now use as a runningexample, the following categories are found:<h t t p : / / d b p e d i a . o rg / r e s o u r c e / A l i e n ( f i l m )>

d c t e r m s : s u b j e c t c a t e g o r y : A l i en ( f r a n c h i s e ) f i l m s ;d c t e r m s : s u b j e c t c a t e g o r y :1979 h o r r o r f i l m s .

In this work we use DBPedia URIs, given their relationto semantic (SKOS) categories. In order to provide suchinformation, however, we require a link between a given itemwithin one of our recommendation datasets and the URI thatdenotes that movie item, prompting the question: How canitems be aligned to their semantic web URIs? Our method forsemantic URI alignment functioned as follows: first, we usedthe SPARQL query from [8] to extract all films (instancesof dbpedia-owl:Film) from DBPedia which contained ayear within one of their categories:4

SELECT DISTINCT ? movie ? t i t l e WHERE {? movie r d f : t y p e dbped ia�owl : Fi lm ;

r d f s : l a b e l ? t i t l e ;d c t e r m s : s u b j e c t ? c a t .

? c a t r d f s : l a b e l ? y e a r .FILTER langMatches ( l a n g ( ? t i t l e ) , ”EN” ) .FILTER r e g e x ( ? year , ”ˆ[0�9]{4} f i l m ” , ” i ” )

}

Using the extracted mapping between the movie URI(?movie) and title (?title) we then identified the setof candidate URIs (C) based on performing fuzzy matchesbetween a given item’s title and the extracted title fromDBPedia. Fuzzy matches were performed using the Leven-shtein similarity metric (derived from the normalised reciprocalLevenshtein distance) and setting the similarity threshold to0.9. We use fuzzy matches here due to the different formsof the movie titles and abbreviations within the datasets andlinked data labels. After deriving the set of candidate URIs,we then dereferenced each URI and looked up its year tosee if it appears within an associated category (i.e. ?moviedcterms:subject ?category). If the year of the movieitem appears within a mapped category (?category) thenwe identified the given semantic URI as denoting the item.This disambiguation was needed here as multiple films canshare the same title - this often happens with film remakes.This approach achieved coverage (i.e. proportion of items

4We used a local copy of DBPedia 3.9 for our experiments:http://wiki.dbpedia.org/Downloads39

mapped) of 83% and 69% for MovieLens and MovieTweetingsrespectively - this reduced coverage for MovieTweetings isexplained by the recency of the movies being reviewed andthe lack of coverage of this on DBPedia at present.

B. Reduced Datasets and the Hipster Dilemma

Based on our alignment of the movie items within eachrespective dataset with their semantic URIs within the webof linked data we derived new reduced datasets - as not allitems can be mapped to URIs. We then further reduced thedataset based on two conditions by only considering userswho: (i) have posted ratings within the training and test sets;and (ii) have posted at least 10 ratings within the trainingand validation segment. The reason for the first condition isthat in this work we do not consider cold start situations,in the discussions section of the paper we explain how thework can be extended to consider such a scenario however.The reason for the second condition is that we need sufficientinformation to understand how a given user’s tastes haveevolved semantically, with only a few posts to go on we arelimited in doing this - we expand on this reasoning in thefollowing section. Table I demonstrates the extent to whichthe items, users and ratings have been reduced. There is alarge reduction for MovieTweetings where the nature of ourapproach means that we only consider 11% of users withinthe dataset. We also note that the reduction in the number ofratings is not as great, this suggests two things: (i) mappeditems are popular, and thus dominate the ratings; and (ii)obscure items are present within the data.

TABLE I. STATISTICS OF THE REVISED REVIEW DATASETS USED FOROUR ANALYSIS AND EXPERIMENTS. REDUCTION OVER THE ORIGINAL

DATASETS ARE SHOWN IN PARENTHESES.

Dataset #Users #Items #RatingsMovieLens 5,390 (-11%) 3,231 (-12.1%) 841,602 (-6.7%)MovieTweetings 2,357 (-89%) 7,913 (-30.8%) 73,397 (-38.2%)Total 7,747 11,144 914,999

As Table I suggests, certain more ‘obscure’ movies donot have DBPedia URIs; despite our use of the most recentDBPedia datasets (i.e. version 3.9) coverage is still limited incertain places. The reason for this lack of coverage for certainitems is largely due to the obscurity of the film not having awikipedia page. For instance, for the MovieLens dataset we failto map the three movies ‘Never Met Picasso’, ‘Diebinnen’ and‘Follow the Bitch’, despite these films having IMDB pages theyhave no wikipedia page, and hence no DBPedia entry. For theMovie Tweetings dataset we fail to map ‘Summer Coda’ and‘The Potted Psalm’, both of which, again, have IMDB pagesbut no wikipedia page. We define the challenge of addressingthe obscurity of movies, and thus their lack of a DBPedia entry,as the hipster dilemma. In the discussions section of the paperwe outline several strategies for dealing with this in the futuregiven its presence not only in this work but in the prior workof Ostuni et al. [8], and its potential effect on future workwithin this domain.

IV. SEMANTIC TASTE EVOLUTION

Now that we have a mapping between the items within ourmovie recommendation datasets and their URIs on the web oflinked data, we can examine how users have rated semantic

Hipster Dilemma: Occurs when obscure movie items cannot be aligned to semantic web URIs!

Page 7: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 6

Forming Semantic Taste Profiles

Split user’s training ratings into 5-stages

Derive the user’s average rating per semantic

category

Calculate the probability of the user rating the category

highly

For each stage…

Pus

categories in the past and how their tastes have evolved, at thissemantic level, over time. From this point onwards we reservethe following special characters for set notations, as follows:

• u, v denote users, and i, j denote items.

• r denotes a known rating value (where r 2 [1, 5] orr 2 [1, 10]), and r denotes a predicted rating value.

• Datasets are provided as quadruples of the form(u, i, r, t) 2 D, where t denotes the time of the rating,and are segmented into training (D

train

), validation(D

train

) and test (Dtest

) sets by the above mentionedcutoff points.

• c denotes a semantic category that an item has beenmapped to, and cats(i) is a convenience function thatreturns the set of semantic categories of item i.

A. Semantic Taste Profiles

Semantic taste profiles describe the preferences that a userhas at a given point in time for given semantic categories.We are interested in understanding how a profile at one pointin time compares to a profile at an earlier point in time,in essence observing if taste evolution has taken place. Inrecent work by McAuley and Leskovec [5] the assessment ofuser-specific evolution in the context of review platforms (e.g.BeerAdvocate and Beer Review) demonstrated the propensityof users to evolve based on their own ‘personal clock’. Thismeans that if we are to segment a user’s lifetime (i.e. timebetween first and last rating) in the recommendation datasetinto discrete lifecycle periods where each period is the samewidth in time, then we will have certain periods with noactivity in them: as the user may go away from the systemduring the mid-point of their lifetime, and then return later.To counter this we divided user’s lifecycle into 5 stages whereeach stage contains the same number of reviews, we denotethis as ‘activity-based lifecycle slicing’. Prior work has used 20

lifecycle stages [10], [5] to model user development, howeverwe found this number to be too high as it dramatically reducedthe number of users for whom we could mine taste evolutioninformation - i.e. a greater number of stages requires moreratings.

To form a semantic taste profile for a given user we usedthe user’s ratings distribution per semantic category within theallotted time window (provided by the lifecycle stage of theuser as this denotes a closed interval - i.e. s = [t, t0], t < t0).We formed a discrete probability distribution for category c attime period s 2 S (where S is the set of 5 lifecycle stages)by interpolating the user’s ratings within the distribution. Wefirst defined two sets, the former (Du,s,c

train

) corresponding to theratings by u during period/stage s for items from category c,and the latter (Du,s

train

) corresponding to ratings by u during s,hence Du,s,c

train

✓ Du,s

train

, these sets are formed as follows:Du,s,c

train

= {(u, i, r, t) : (u, i, r, t) 2 Dtrain

, t 2 s, c 2 cats(i)}(1)

Du,s

train

= {(u, i, r, t) : (u, i, r, t) 2 Dtrain

, t 2 s} (2)

We then defined the function avrating to derive theaverage rating value from all rating quadruples in a given set:

avrating(Du,s

train

) =

1

|Du,s

train

|X

(u,i,r,t)2D

u,strain

r (3)

From these definitions we then derived the discrete prob-ability distribution of the user rating the category favourablyas follows, defining the set Cu,s

train

as containing all uniquecategories of items rated by u in stage s:

Pr(c|Du,s

train

) =

avrating(Du,s,c

train

)X

c

02C

u,strain

avrating(Du,s,c

0

train

)

(4)

When implementing this approach, we only consider thecategories that item URIs are directly mapped to; that is,only those categories that are connected to the URI by thedbterms:subject predicate. Prior work by Ostuni et al.[8] performed a mapping where grandparent categories weremapped to URIs, however we chose the parent categories inthis instance to open up the possibility of other mappings inthe future - i.e. via linked data node vertex kernels.

B. User Taste Evolution: From Prior Taste Profiles

We now turn to looking at the evolution of users’ tastesover time in order to understand how their preferences change.Given our use of probability distributions to model the lifecyclestage specific taste profile of each user, we can apply infor-mation theoretic measures based on information entropy. Onesuch measure is conditional entropy, it enables one to assessthe information needed to describe the taste profile of a userat one time step (Q) using his taste profile from the previousstage (P ). A reduction in conditional entropy indicates thatthe user’s taste profile is similar to that of his previous stage’sprofile, while an increase indicates the converse:

H(Q|P ) =

X

x2P,

y2Q

p(x, y) logp(x)

p(x, y)(5)

We derived the conditional entropy over the 5 lifecycleperiods in a pairwise fashion, i.e. H(P2|P1), . . . , H(P5|P4),for each user, and plotted the curve of the mean conditionalentropy in Figure 2 over each dataset’s training split - includingthe 95% confidence intervals to show the variation in theconditional entropies. Figure 2 indicates that MovieLens userstend to diverge in their ratings and categories over time, giventhe increase in the mean curve towards later portions of theusers’ lifecycles, the same is also evident for Movie Tweetings,however the increase is more gradual there. This means thatthe semantic taste profiles of users evolve away from theirprevious preferences, suggesting that they are less likely tohave follow their prior tastes and branch out to new semanticcategories

C. User Taste Evolution: Susceptibility to Global Influence

Our second analysis looks at the influence that usersin general have on the taste profiles of individual users -modelling user-specific (local) taste development and globaldevelopment as two different processes. We used transferentropy to assess how the taste profile (P

s

) of a user at one timestep (s) has been influenced by (his own) local profile (P

s�1)and global taste profile (Q

s�1) at the previous lifecycle stage(s�1). For the latter taste profile (Q

s�1), we formed a globalprobability distribution (as above for a single user) using all

Time

1 2

5

Page 8: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 7

Taste Evolution from Taste Profiles

0.22

50.

235

0.24

5

Lifecycle Stages

Con

ditio

nal E

ntro

py

1 2 3 4 5

(a) MovieLens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Con

ditio

nal E

ntro

py

●●

1 2 3 4 5

(b) MovieTweetings

Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

users who posted ratings within the time interval of stage s.Now, assume that we have a random variable that describes thelocal categories that have been reviewed at the current stage(Y

s

), a random variable of local categories at the previous stage(Y

s�1). and a third random variable of global categories at theprevious stage (X

s�1), we then define the transfer entropy ofone lifecycle stage to another as follows [11]:

TX!Y

= H(Ys

|Ys�1)�H(Y

s

|Ys�1, Xs�1) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variablesfrom Y

s

, Ys�1 and X

s�1:

TX!Y

=

X

y2Ys,

y

02Ys�1,

x2Xs�1

p(y, y0, x) logp(y|y0, x)p(y|y0) (7)

We derived the transfer entropy between consecutive lifecy-cle periods, as with the conditional entropy above, to examinehow the influence of global and local dynamics on users’taste profiles developed over time. Figure 3 plots the meansof these values across the lifecycle periods together with the95% confidence intervals. We find that for users of MovieLenstransfer entropy decreases over time, indicating that globaldynamics have a stronger influence on users’ taste profilestowards later lifecycle stages. Such an effect is characteristicof users becoming more involved and familiar with the reviewsystem, and as a consequence paying more attention to others’ratings. With Movie Tweetings we find a different effect:users’ transfer entropy actually increases over time, indicatingthat users are less influenced by global taste preferences, andtherefore the ratings of other users.

V. SEMANTICSVD++

Examining the semantic taste evolution of users indicatestwo aspects: firstly, users tend to diverge away from theirpast rating behaviour; and secondly, users are susceptibleto global influence in a different manner - i.e. MovieLensusers are more susceptible than MovieTweetings. This leadsto one of our research questions: how can semantic tasteevolution be incorporated within recommender system? Inthis section we address this by incorporating semantic tasteinformation into a matrix factorisation model that we have

0.12

00.

122

0.12

4

Lifecycle Stages

Tran

sfer

Ent

ropy ●

1 2 3 4 5

(a) MovieLens

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

(b) MovieTweetings

Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

named SemanticSV D++, an extension of Koren et al.’s ear-lier SV D++ model [2]. The predictive function of the modelis shown in full in Eq. 8, we now explain each component ingreater detail.

rui

=

Static Biasesz }| {µ+ b

i

+ bu

+

Category Biasesz }| {↵i

bi,cats(i) + ↵

u

bu,cats(i)

+

Personalisation Componentz }| {

q

|i

p

u

+ |R(u)|� 12

X

j2R(u)

y

j

+ |cats(R(u))|� 12

X

c2cats(R(u))

z

c

!

(8)

A. Static Biases

The static biases include the general bias of the givendataset (µ), which is the mean rating score across all ratingswithin the training segment; the item bias (b

i

), and the user bias(b

u

). The item bias is the average deviation from the mean biasfor the item i within the training segment, while the user biasis the average deviation from the mean bias from the trainingsegment’s ratings by user u.

B. Category Biases

The examination of user taste profiles demonstrated theevolution of user’s tastes for different semantic categoriesover time. We can encapsulate such information within ourrecommendation model by including: (i) biases towards cate-gories given general rating behaviour, and (ii) biases towardscategories by a specific user. We begin with the former.

1) Item Biases Towards Categories: We model the biasesthat an item may have given the categories it has been linkedto by capturing the proportional change in category ratingsacross the entire dataset - i.e. in general over the providedtraining portion. To do this we derived the development ofall users’ preferences for a given category c throughout thetraining segment, where Q

s

is the global taste profile (discreteprobability distribution of all categories) in stage s, and k isthe number of stages back in the training segment from whicheither a monotonic increase or decrease in the probability of

Prior Tastes Comparison Increase = divergence from prior tastes

0.22

50.

235

0.24

5

Lifecycle Stages

Con

ditio

nal E

ntro

py

1 2 3 4 5

(a) MovieLens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Con

ditio

nal E

ntro

py

●●

1 2 3 4 5

(b) MovieTweetings

Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

users who posted ratings within the time interval of stage s.Now, assume that we have a random variable that describes thelocal categories that have been reviewed at the current stage(Y

s

), a random variable of local categories at the previous stage(Y

s�1). and a third random variable of global categories at theprevious stage (X

s�1), we then define the transfer entropy ofone lifecycle stage to another as follows [11]:

TX!Y

= H(Ys

|Ys�1)�H(Y

s

|Ys�1, Xs�1) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variablesfrom Y

s

, Ys�1 and X

s�1:

TX!Y

=

X

y2Ys,

y

02Ys�1,

x2Xs�1

p(y, y0, x) logp(y|y0, x)p(y|y0) (7)

We derived the transfer entropy between consecutive lifecy-cle periods, as with the conditional entropy above, to examinehow the influence of global and local dynamics on users’taste profiles developed over time. Figure 3 plots the meansof these values across the lifecycle periods together with the95% confidence intervals. We find that for users of MovieLenstransfer entropy decreases over time, indicating that globaldynamics have a stronger influence on users’ taste profilestowards later lifecycle stages. Such an effect is characteristicof users becoming more involved and familiar with the reviewsystem, and as a consequence paying more attention to others’ratings. With Movie Tweetings we find a different effect:users’ transfer entropy actually increases over time, indicatingthat users are less influenced by global taste preferences, andtherefore the ratings of other users.

V. SEMANTICSVD++

Examining the semantic taste evolution of users indicatestwo aspects: firstly, users tend to diverge away from theirpast rating behaviour; and secondly, users are susceptibleto global influence in a different manner - i.e. MovieLensusers are more susceptible than MovieTweetings. This leadsto one of our research questions: how can semantic tasteevolution be incorporated within recommender system? Inthis section we address this by incorporating semantic tasteinformation into a matrix factorisation model that we have

0.12

00.

122

0.12

4

Lifecycle Stages

Tran

sfer

Ent

ropy ●

1 2 3 4 5

(a) MovieLens

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

(b) MovieTweetings

Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

named SemanticSV D++, an extension of Koren et al.’s ear-lier SV D++ model [2]. The predictive function of the modelis shown in full in Eq. 8, we now explain each component ingreater detail.

rui

=

Static Biasesz }| {µ+ b

i

+ bu

+

Category Biasesz }| {↵i

bi,cats(i) + ↵

u

bu,cats(i)

+

Personalisation Componentz }| {

q

|i

p

u

+ |R(u)|� 12

X

j2R(u)

y

j

+ |cats(R(u))|� 12

X

c2cats(R(u))

z

c

!

(8)

A. Static Biases

The static biases include the general bias of the givendataset (µ), which is the mean rating score across all ratingswithin the training segment; the item bias (b

i

), and the user bias(b

u

). The item bias is the average deviation from the mean biasfor the item i within the training segment, while the user biasis the average deviation from the mean bias from the trainingsegment’s ratings by user u.

B. Category Biases

The examination of user taste profiles demonstrated theevolution of user’s tastes for different semantic categoriesover time. We can encapsulate such information within ourrecommendation model by including: (i) biases towards cate-gories given general rating behaviour, and (ii) biases towardscategories by a specific user. We begin with the former.

1) Item Biases Towards Categories: We model the biasesthat an item may have given the categories it has been linkedto by capturing the proportional change in category ratingsacross the entire dataset - i.e. in general over the providedtraining portion. To do this we derived the development ofall users’ preferences for a given category c throughout thetraining segment, where Q

s

is the global taste profile (discreteprobability distribution of all categories) in stage s, and k isthe number of stages back in the training segment from whicheither a monotonic increase or decrease in the probability of

Global Influence Decrease = global tastes influence > prior tastes

Page 9: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 8

Putting it all together: SemanticSVD++!

0.22

50.

235

0.24

5

Lifecycle Stages

Con

ditio

nal E

ntro

py

1 2 3 4 5

(a) MovieLens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Con

ditio

nal E

ntro

py●

1 2 3 4 5

(b) MovieTweetings

Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

users who posted ratings within the time interval of stage s.Now, assume that we have a random variable that describes thelocal categories that have been reviewed at the current stage(Y

s

), a random variable of local categories at the previous stage(Y

s�1). and a third random variable of global categories at theprevious stage (X

s�1), we then define the transfer entropy ofone lifecycle stage to another as follows [11]:

TX!Y

= H(Ys

|Ys�1)�H(Y

s

|Ys�1, Xs�1) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variablesfrom Y

s

, Ys�1 and X

s�1:

TX!Y

=

X

y2Ys,

y

02Ys�1,

x2Xs�1

p(y, y0, x) logp(y|y0, x)p(y|y0) (7)

We derived the transfer entropy between consecutive lifecy-cle periods, as with the conditional entropy above, to examinehow the influence of global and local dynamics on users’taste profiles developed over time. Figure 3 plots the meansof these values across the lifecycle periods together with the95% confidence intervals. We find that for users of MovieLenstransfer entropy decreases over time, indicating that globaldynamics have a stronger influence on users’ taste profilestowards later lifecycle stages. Such an effect is characteristicof users becoming more involved and familiar with the reviewsystem, and as a consequence paying more attention to others’ratings. With Movie Tweetings we find a different effect:users’ transfer entropy actually increases over time, indicatingthat users are less influenced by global taste preferences, andtherefore the ratings of other users.

V. SEMANTICSVD++

Examining the semantic taste evolution of users indicatestwo aspects: firstly, users tend to diverge away from theirpast rating behaviour; and secondly, users are susceptibleto global influence in a different manner - i.e. MovieLensusers are more susceptible than MovieTweetings. This leadsto one of our research questions: how can semantic tasteevolution be incorporated within recommender system? Inthis section we address this by incorporating semantic tasteinformation into a matrix factorisation model that we have

0.12

00.

122

0.12

4

Lifecycle Stages

Tran

sfer

Ent

ropy ●

1 2 3 4 5

(a) MovieLens

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

(b) MovieTweetings

Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

named SemanticSV D++, an extension of Koren et al.’s ear-lier SV D++ model [2]. The predictive function of the modelis shown in full in Eq. 8, we now explain each component ingreater detail.

rui

=

Static Biasesz }| {µ+ b

i

+ bu

+

Category Biasesz }| {↵i

bi,cats(i) + ↵

u

bu,cats(i)

+

Personalisation Componentz }| {

q

|i

p

u

+ |R(u)|� 12

X

j2R(u)

y

j

+ |cats(R(u))|� 12

X

c2cats(R(u))

z

c

!

(8)

A. Static Biases

The static biases include the general bias of the givendataset (µ), which is the mean rating score across all ratingswithin the training segment; the item bias (b

i

), and the user bias(b

u

). The item bias is the average deviation from the mean biasfor the item i within the training segment, while the user biasis the average deviation from the mean bias from the trainingsegment’s ratings by user u.

B. Category Biases

The examination of user taste profiles demonstrated theevolution of user’s tastes for different semantic categoriesover time. We can encapsulate such information within ourrecommendation model by including: (i) biases towards cate-gories given general rating behaviour, and (ii) biases towardscategories by a specific user. We begin with the former.

1) Item Biases Towards Categories: We model the biasesthat an item may have given the categories it has been linkedto by capturing the proportional change in category ratingsacross the entire dataset - i.e. in general over the providedtraining portion. To do this we derived the development ofall users’ preferences for a given category c throughout thetraining segment, where Q

s

is the global taste profile (discreteprobability distribution of all categories) in stage s, and k isthe number of stages back in the training segment from whicheither a monotonic increase or decrease in the probability of

Modified version of SVD++ with: •  User taste evolution captured in semantic category biases •  Semantic personalisation component

c latent factor vectors for each of the rated categories by the user

Page 10: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

rating category c began from:

�c

=

1

4� k

4X

s=k

Qs+1(c)�Q

s

(c)

Qs

(c)(9)

From this we then calculated the conditional probabilityof a given category being rated highly by accounting for thechange rate of rating preference for the category as follows:

Pr(+|c) =Prior Ratingz }| {Q5(c) +

Change Ratez }| {�c

Q5(c) (10)

By averaging this over all categories for the item i we cancalculate the evolving item bias from the provided trainingsegment:

bi,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c) (11)

2) User Biases Towards Categories: In the previous sec-tion, we induced per-user discrete probability distributions thatcaptured the probability of the user u rating a given category chighly during lifecycle stage s: Pu

s

(c). Given that users’ tasteevolve, our goal is to estimate the probability of the user ratingan item highly given its categories by capturing how the user’spreferences for each category have changed in past (decayingor growing). To capture the development of a user’s preferencefor a category we derived the average change rate (�u

c

) overthe k lifecycle periods coming before the final lifecycle stagein the training set. The parameter k is the number of stagesback in the training segment from which either a monotonicincrease or decrease in the probability of rating category cbegan from. We define the change rate (�u

c

) as follows:

�uc

=

1

4� k

4X

s=k

Pu

s+1(c)� Pu

s

(c)

Pu

s

(c)(12)

In a similar vein, we also capture the influence of globaldynamics on the user’s taste profile. We found in the previoussection that the transfer entropy of the users across all threeplatforms either reduced or increased as users’ progressedthroughout their lifecycles, indicating an increase or decreaseof influence by global taste dynamics respectively. We can cap-ture such signals on a per-user basis by assessing the change intransfer entropy for each user over time and modelling this as aglobal influence factor �u. We derive this as follows, based onmeasuring the proportional change in transfer entropy startingfrom lifecycle period k that produced a monotonic increase ordecrease in transfer entropy:

�u

=

1

4� k

4X

s=k

Ts+1|sQ!P

� Ts|s�1Q!P

Ts|s�1Q!P

(13)

By combining the average change rate (�u

c

) of the userhighly rating a given category c with the global influence factor(�u), we then derived the conditional probability of a userrating a given category highly as follows, where Pu

5 denotesthe taste profile of the user observed for the final lifecyclestage (5):

Pr(+|c, u) =Prior Ratingz }| {Pu

5 (c) +

Change Ratez }| {�uc

Pu

5 (c) +

Global Influencez }| {�uQ5(c) (14)

Given that a single item can be linked to many categorieson the web of linked data, we take the average across allcategories as the bias of the user given the categories of theitem:

bu,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c, u) (15)

Other schemes for calculating the biases towards categories(both item and user) could be used, e.g. choosing the maximumbias, however we use the average as an initial scheme.

3) Weighting Category Biases: The above category biasesare derived as static features within the recommendation model(Eq. 8) mined from the provided training portion, howevereach user may be influenced by these factors in different wayswhen performing their ratings. To this end we included twoweights, one for each category bias, defined as ↵

i

and ↵u

forthe item biases to categories and the user biases to categoriesrespectively. As we will explain below, these weights are thenlearnt during the training phase of inducing the model.

C. Personalisation Component

The personalisation component of the SemanticSV D++

model builds on the existing SV D++ model by Koren et al.[2]. The modified model has four latent factor vectors: q

i

2 Rf

denotes the f latent factors associated with the item i; pu

2 Rf

denotes the f latent factors associated with the user u; yj

2 Rf

denotes the f latent factors for item j from the set of rateditems by user u: R(u); and we have defined a new vector z

c

2Rf which captures the latent factor vector, of f -dimensions,for a given semantic category c. We denote this latter, addi-tional component as the category factors component, and itsinclusion is based on the notion that semantic categories have astronger affinity with certain factors, for instance the DBPediacategory category:1970s_science_fiction_filmswill have a strong positive affinity with the latent factorcorresponding to Science Fiction films, and that this can betailored to the user given that we derive a single vector foreach semantic category of their rated items. By includingthis information we anticipated that additional cues for userpreferences, functioning between the semantic categories andlatent factors, would be captured. The latent factors are derivedduring learning and the number of factors to use is set as ahyperparameter in the model - as we shall explain below.

D. Model Learning and Hyperparameter Optimisation

To learn the parameters in our recommendation model(item and user biases, category bias weights, latent factorvectors) our goal was to minimise the following objectivefunction, regularising model parameters to control for overfitting using L2-regularisation:

min

b⇤,↵⇤,p⇤,q⇤

X

(u,i,t,r)2D

(rui

� rui

)

2

+ �(b2i

+ b2u

+ ↵2i

+ ↵2u

+ ||qi

||22 + ||pu

||22) (16)

To learn the parameters we used Stochastic Gradient De-scent (SGD) [12] using the standard process of first shufflingthe order of the ratings within the training set, and then runningthrough the set of ratings one at a time. For each rating we

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 9

0.22

50.

235

0.24

5

Lifecycle Stages

Cond

itiona

l Ent

ropy

1 2 3 4 5

(a) MovieLens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Cond

itiona

l Ent

ropy

●●

1 2 3 4 5

(b) MovieTweetings

Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

users who posted ratings within the time interval of stage s.Now, assume that we have a random variable that describes thelocal categories that have been reviewed at the current stage(Y

s

), a random variable of local categories at the previous stage(Y

s�1). and a third random variable of global categories at theprevious stage (X

s�1), we then define the transfer entropy ofone lifecycle stage to another as follows [11]:

TX!Y

= H(Ys

|Ys�1)�H(Y

s

|Ys�1, Xs�1) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variablesfrom Y

s

, Ys�1 and X

s�1:

TX!Y

=

X

y2Ys,

y

02Ys�1,

x2Xs�1

p(y, y0, x) logp(y|y0, x)p(y|y0) (7)

We derived the transfer entropy between consecutive lifecy-cle periods, as with the conditional entropy above, to examinehow the influence of global and local dynamics on users’taste profiles developed over time. Figure 3 plots the meansof these values across the lifecycle periods together with the95% confidence intervals. We find that for users of MovieLenstransfer entropy decreases over time, indicating that globaldynamics have a stronger influence on users’ taste profilestowards later lifecycle stages. Such an effect is characteristicof users becoming more involved and familiar with the reviewsystem, and as a consequence paying more attention to others’ratings. With Movie Tweetings we find a different effect:users’ transfer entropy actually increases over time, indicatingthat users are less influenced by global taste preferences, andtherefore the ratings of other users.

V. SEMANTICSVD++

Examining the semantic taste evolution of users indicatestwo aspects: firstly, users tend to diverge away from theirpast rating behaviour; and secondly, users are susceptibleto global influence in a different manner - i.e. MovieLensusers are more susceptible than MovieTweetings. This leadsto one of our research questions: how can semantic tasteevolution be incorporated within recommender system? Inthis section we address this by incorporating semantic tasteinformation into a matrix factorisation model that we have

0.12

00.

122

0.12

4

Lifecycle Stages

Tran

sfer

Ent

ropy ●

1 2 3 4 5

(a) MovieLens

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

(b) MovieTweetings

Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

named SemanticSV D++, an extension of Koren et al.’s ear-lier SV D++ model [2]. The predictive function of the modelis shown in full in Eq. 8, we now explain each component ingreater detail.

rui

=

Static Biasesz }| {µ+ b

i

+ bu

+

Category Biasesz }| {↵i

bi,cats(i) + ↵

u

bu,cats(i)

+

Personalisation Componentz }| {

q

|i

p

u

+ |R(u)|� 12

X

j2R(u)

y

j

+ |cats(R(u))|� 12

X

c2cats(R(u))

z

c

!

(8)

A. Static Biases

The static biases include the general bias of the givendataset (µ), which is the mean rating score across all ratingswithin the training segment; the item bias (b

i

), and the user bias(b

u

). The item bias is the average deviation from the mean biasfor the item i within the training segment, while the user biasis the average deviation from the mean bias from the trainingsegment’s ratings by user u.

B. Category Biases

The examination of user taste profiles demonstrated theevolution of user’s tastes for different semantic categoriesover time. We can encapsulate such information within ourrecommendation model by including: (i) biases towards cate-gories given general rating behaviour, and (ii) biases towardscategories by a specific user. We begin with the former.

1) Item Biases Towards Categories: We model the biasesthat an item may have given the categories it has been linkedto by capturing the proportional change in category ratingsacross the entire dataset - i.e. in general over the providedtraining portion. To do this we derived the development ofall users’ preferences for a given category c throughout thetraining segment, where Q

s

is the global taste profile (discreteprobability distribution of all categories) in stage s, and k isthe number of stages back in the training segment from whicheither a monotonic increase or decrease in the probability of

rating category c began from:

�c

=

1

4� k

4X

s=k

Qs+1(c)�Q

s

(c)

Qs

(c)(9)

From this we then calculated the conditional probabilityof a given category being rated highly by accounting for thechange rate of rating preference for the category as follows:

Pr(+|c) =Prior Ratingz }| {Q5(c) +

Change Ratez }| {�c

Q5(c) (10)

By averaging this over all categories for the item i we cancalculate the evolving item bias from the provided trainingsegment:

bi,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c) (11)

2) User Biases Towards Categories: In the previous sec-tion, we induced per-user discrete probability distributions thatcaptured the probability of the user u rating a given category chighly during lifecycle stage s: Pu

s

(c). Given that users’ tasteevolve, our goal is to estimate the probability of the user ratingan item highly given its categories by capturing how the user’spreferences for each category have changed in past (decayingor growing). To capture the development of a user’s preferencefor a category we derived the average change rate (�u

c

) overthe k lifecycle periods coming before the final lifecycle stagein the training set. The parameter k is the number of stagesback in the training segment from which either a monotonicincrease or decrease in the probability of rating category cbegan from. We define the change rate (�u

c

) as follows:

�uc

=

1

4� k

4X

s=k

Pu

s+1(c)� Pu

s

(c)

Pu

s

(c)(12)

In a similar vein, we also capture the influence of globaldynamics on the user’s taste profile. We found in the previoussection that the transfer entropy of the users across all threeplatforms either reduced or increased as users’ progressedthroughout their lifecycles, indicating an increase or decreaseof influence by global taste dynamics respectively. We can cap-ture such signals on a per-user basis by assessing the change intransfer entropy for each user over time and modelling this as aglobal influence factor �u. We derive this as follows, based onmeasuring the proportional change in transfer entropy startingfrom lifecycle period k that produced a monotonic increase ordecrease in transfer entropy:

�u

=

1

4� k

4X

s=k

Ts+1|sQ!P

� Ts|s�1Q!P

Ts|s�1Q!P

(13)

By combining the average change rate (�u

c

) of the userhighly rating a given category c with the global influence factor(�u), we then derived the conditional probability of a userrating a given category highly as follows, where Pu

5 denotesthe taste profile of the user observed for the final lifecyclestage (5):

Pr(+|c, u) =Prior Ratingz }| {Pu

5 (c) +

Change Ratez }| {�uc

Pu

5 (c) +

Global Influencez }| {�uQ5(c) (14)

Given that a single item can be linked to many categorieson the web of linked data, we take the average across allcategories as the bias of the user given the categories of theitem:

bu,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c, u) (15)

Other schemes for calculating the biases towards categories(both item and user) could be used, e.g. choosing the maximumbias, however we use the average as an initial scheme.

3) Weighting Category Biases: The above category biasesare derived as static features within the recommendation model(Eq. 8) mined from the provided training portion, howevereach user may be influenced by these factors in different wayswhen performing their ratings. To this end we included twoweights, one for each category bias, defined as ↵

i

and ↵u

forthe item biases to categories and the user biases to categoriesrespectively. As we will explain below, these weights are thenlearnt during the training phase of inducing the model.

C. Personalisation Component

The personalisation component of the SemanticSV D++

model builds on the existing SV D++ model by Koren et al.[2]. The modified model has four latent factor vectors: q

i

2 Rf

denotes the f latent factors associated with the item i; pu

2 Rf

denotes the f latent factors associated with the user u; yj

2 Rf

denotes the f latent factors for item j from the set of rateditems by user u: R(u); and we have defined a new vector z

c

2Rf which captures the latent factor vector, of f -dimensions,for a given semantic category c. We denote this latter, addi-tional component as the category factors component, and itsinclusion is based on the notion that semantic categories have astronger affinity with certain factors, for instance the DBPediacategory category:1970s_science_fiction_filmswill have a strong positive affinity with the latent factorcorresponding to Science Fiction films, and that this can betailored to the user given that we derive a single vector foreach semantic category of their rated items. By includingthis information we anticipated that additional cues for userpreferences, functioning between the semantic categories andlatent factors, would be captured. The latent factors are derivedduring learning and the number of factors to use is set as ahyperparameter in the model - as we shall explain below.

D. Model Learning and Hyperparameter Optimisation

To learn the parameters in our recommendation model(item and user biases, category bias weights, latent factorvectors) our goal was to minimise the following objectivefunction, regularising model parameters to control for overfitting using L2-regularisation:

min

b⇤,↵⇤,p⇤,q⇤

X

(u,i,t,r)2D

(rui

� rui

)

2

+ �(b2i

+ b2u

+ ↵2i

+ ↵2u

+ ||qi

||22 + ||pu

||22) (16)

To learn the parameters we used Stochastic Gradient De-scent (SGD) [12] using the standard process of first shufflingthe order of the ratings within the training set, and then runningthrough the set of ratings one at a time. For each rating we

rating category c began from:

�c

=

1

4� k

4X

s=k

Qs+1(c)�Q

s

(c)

Qs

(c)(9)

From this we then calculated the conditional probabilityof a given category being rated highly by accounting for thechange rate of rating preference for the category as follows:

Pr(+|c) =Prior Ratingz }| {Q5(c) +

Change Ratez }| {�c

Q5(c) (10)

By averaging this over all categories for the item i we cancalculate the evolving item bias from the provided trainingsegment:

bi,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c) (11)

2) User Biases Towards Categories: In the previous sec-tion, we induced per-user discrete probability distributions thatcaptured the probability of the user u rating a given category chighly during lifecycle stage s: Pu

s

(c). Given that users’ tasteevolve, our goal is to estimate the probability of the user ratingan item highly given its categories by capturing how the user’spreferences for each category have changed in past (decayingor growing). To capture the development of a user’s preferencefor a category we derived the average change rate (�u

c

) overthe k lifecycle periods coming before the final lifecycle stagein the training set. The parameter k is the number of stagesback in the training segment from which either a monotonicincrease or decrease in the probability of rating category cbegan from. We define the change rate (�u

c

) as follows:

�uc

=

1

4� k

4X

s=k

Pu

s+1(c)� Pu

s

(c)

Pu

s

(c)(12)

In a similar vein, we also capture the influence of globaldynamics on the user’s taste profile. We found in the previoussection that the transfer entropy of the users across all threeplatforms either reduced or increased as users’ progressedthroughout their lifecycles, indicating an increase or decreaseof influence by global taste dynamics respectively. We can cap-ture such signals on a per-user basis by assessing the change intransfer entropy for each user over time and modelling this as aglobal influence factor �u. We derive this as follows, based onmeasuring the proportional change in transfer entropy startingfrom lifecycle period k that produced a monotonic increase ordecrease in transfer entropy:

�u

=

1

4� k

4X

s=k

Ts+1|sQ!P

� Ts|s�1Q!P

Ts|s�1Q!P

(13)

By combining the average change rate (�u

c

) of the userhighly rating a given category c with the global influence factor(�u), we then derived the conditional probability of a userrating a given category highly as follows, where Pu

5 denotesthe taste profile of the user observed for the final lifecyclestage (5):

Pr(+|c, u) =Prior Ratingz }| {Pu

5 (c) +

Change Ratez }| {�uc

Pu

5 (c) +

Global Influencez }| {�uQ5(c) (14)

Given that a single item can be linked to many categorieson the web of linked data, we take the average across allcategories as the bias of the user given the categories of theitem:

bu,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c, u) (15)

Other schemes for calculating the biases towards categories(both item and user) could be used, e.g. choosing the maximumbias, however we use the average as an initial scheme.

3) Weighting Category Biases: The above category biasesare derived as static features within the recommendation model(Eq. 8) mined from the provided training portion, howevereach user may be influenced by these factors in different wayswhen performing their ratings. To this end we included twoweights, one for each category bias, defined as ↵

i

and ↵u

forthe item biases to categories and the user biases to categoriesrespectively. As we will explain below, these weights are thenlearnt during the training phase of inducing the model.

C. Personalisation Component

The personalisation component of the SemanticSV D++

model builds on the existing SV D++ model by Koren et al.[2]. The modified model has four latent factor vectors: q

i

2 Rf

denotes the f latent factors associated with the item i; pu

2 Rf

denotes the f latent factors associated with the user u; yj

2 Rf

denotes the f latent factors for item j from the set of rateditems by user u: R(u); and we have defined a new vector z

c

2Rf which captures the latent factor vector, of f -dimensions,for a given semantic category c. We denote this latter, addi-tional component as the category factors component, and itsinclusion is based on the notion that semantic categories have astronger affinity with certain factors, for instance the DBPediacategory category:1970s_science_fiction_filmswill have a strong positive affinity with the latent factorcorresponding to Science Fiction films, and that this can betailored to the user given that we derive a single vector foreach semantic category of their rated items. By includingthis information we anticipated that additional cues for userpreferences, functioning between the semantic categories andlatent factors, would be captured. The latent factors are derivedduring learning and the number of factors to use is set as ahyperparameter in the model - as we shall explain below.

D. Model Learning and Hyperparameter Optimisation

To learn the parameters in our recommendation model(item and user biases, category bias weights, latent factorvectors) our goal was to minimise the following objectivefunction, regularising model parameters to control for overfitting using L2-regularisation:

min

b⇤,↵⇤,p⇤,q⇤

X

(u,i,t,r)2D

(rui

� rui

)

2

+ �(b2i

+ b2u

+ ↵2i

+ ↵2u

+ ||qi

||22 + ||pu

||22) (16)

To learn the parameters we used Stochastic Gradient De-scent (SGD) [12] using the standard process of first shufflingthe order of the ratings within the training set, and then runningthrough the set of ratings one at a time. For each rating we

Of global category rating probability

Average change in Transfer Entropy of the User

Incorporating Taste Evolution with Biases

rating category c began from:

�c

=

1

4� k

4X

s=k

Qs+1(c)�Q

s

(c)

Qs

(c)(9)

From this we then calculated the conditional probabilityof a given category being rated highly by accounting for thechange rate of rating preference for the category as follows:

Pr(+|c) =Prior Ratingz }| {Q5(c) +

Change Ratez }| {�c

Q5(c) (10)

By averaging this over all categories for the item i we cancalculate the evolving item bias from the provided trainingsegment:

bi,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c) (11)

2) User Biases Towards Categories: In the previous sec-tion, we induced per-user discrete probability distributions thatcaptured the probability of the user u rating a given category chighly during lifecycle stage s: Pu

s

(c). Given that users’ tasteevolve, our goal is to estimate the probability of the user ratingan item highly given its categories by capturing how the user’spreferences for each category have changed in past (decayingor growing). To capture the development of a user’s preferencefor a category we derived the average change rate (�u

c

) overthe k lifecycle periods coming before the final lifecycle stagein the training set. The parameter k is the number of stagesback in the training segment from which either a monotonicincrease or decrease in the probability of rating category cbegan from. We define the change rate (�u

c

) as follows:

�uc

=

1

4� k

4X

s=k

Pu

s+1(c)� Pu

s

(c)

Pu

s

(c)(12)

In a similar vein, we also capture the influence of globaldynamics on the user’s taste profile. We found in the previoussection that the transfer entropy of the users across all threeplatforms either reduced or increased as users’ progressedthroughout their lifecycles, indicating an increase or decreaseof influence by global taste dynamics respectively. We can cap-ture such signals on a per-user basis by assessing the change intransfer entropy for each user over time and modelling this as aglobal influence factor �u. We derive this as follows, based onmeasuring the proportional change in transfer entropy startingfrom lifecycle period k that produced a monotonic increase ordecrease in transfer entropy:

�u

=

1

4� k

4X

s=k

Ts+1|sQ!P

� Ts|s�1Q!P

Ts|s�1Q!P

(13)

By combining the average change rate (�u

c

) of the userhighly rating a given category c with the global influence factor(�u), we then derived the conditional probability of a userrating a given category highly as follows, where Pu

5 denotesthe taste profile of the user observed for the final lifecyclestage (5):

Pr(+|c, u) =Prior Ratingz }| {Pu

5 (c) +

Change Ratez }| {�uc

Pu

5 (c) +

Global Influencez }| {�uQ5(c) (14)

Given that a single item can be linked to many categorieson the web of linked data, we take the average across allcategories as the bias of the user given the categories of theitem:

bu,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c, u) (15)

Other schemes for calculating the biases towards categories(both item and user) could be used, e.g. choosing the maximumbias, however we use the average as an initial scheme.

3) Weighting Category Biases: The above category biasesare derived as static features within the recommendation model(Eq. 8) mined from the provided training portion, howevereach user may be influenced by these factors in different wayswhen performing their ratings. To this end we included twoweights, one for each category bias, defined as ↵

i

and ↵u

forthe item biases to categories and the user biases to categoriesrespectively. As we will explain below, these weights are thenlearnt during the training phase of inducing the model.

C. Personalisation Component

The personalisation component of the SemanticSV D++

model builds on the existing SV D++ model by Koren et al.[2]. The modified model has four latent factor vectors: q

i

2 Rf

denotes the f latent factors associated with the item i; pu

2 Rf

denotes the f latent factors associated with the user u; yj

2 Rf

denotes the f latent factors for item j from the set of rateditems by user u: R(u); and we have defined a new vector z

c

2Rf which captures the latent factor vector, of f -dimensions,for a given semantic category c. We denote this latter, addi-tional component as the category factors component, and itsinclusion is based on the notion that semantic categories have astronger affinity with certain factors, for instance the DBPediacategory category:1970s_science_fiction_filmswill have a strong positive affinity with the latent factorcorresponding to Science Fiction films, and that this can betailored to the user given that we derive a single vector foreach semantic category of their rated items. By includingthis information we anticipated that additional cues for userpreferences, functioning between the semantic categories andlatent factors, would be captured. The latent factors are derivedduring learning and the number of factors to use is set as ahyperparameter in the model - as we shall explain below.

D. Model Learning and Hyperparameter Optimisation

To learn the parameters in our recommendation model(item and user biases, category bias weights, latent factorvectors) our goal was to minimise the following objectivefunction, regularising model parameters to control for overfitting using L2-regularisation:

min

b⇤,↵⇤,p⇤,q⇤

X

(u,i,t,r)2D

(rui

� rui

)

2

+ �(b2i

+ b2u

+ ↵2i

+ ↵2u

+ ||qi

||22 + ||pu

||22) (16)

To learn the parameters we used Stochastic Gradient De-scent (SGD) [12] using the standard process of first shufflingthe order of the ratings within the training set, and then runningthrough the set of ratings one at a time. For each rating we

General Category Biases

User Biases to Categories

Page 11: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

Evaluation Setup

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

10

¨  Tested three models (trained using Stochastic Gradient Descent) ¤  SVD++ (baseline) ¤  SB-SVD++: SVD++ with Semantic Category Biases ¤  S-SVD++ (SB-SVD++ with personalisation component)

¨  Tuned hyperparameters over the validation splits ¨  Model testing:

¤  Trained models with tuned hyperparameters using both training and validation splits

¤ Applied to held-out final 10% of reviews ¨  Evaluation measure: Root Mean Square Error

Page 12: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

Evaluation Results

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

11

¨  Significantly outperformed the SVD++ baseline ¨  MovieLens:

¤ Full model (S-SVD++) produces significantly superior performance

¨  MovieTweetings: ¤ Marginal difference between SB-SVD++ and S-SVD++

TABLE III. ROOT MEAN SQUARE ERROR (RMSE) OF THE THREEMODELS ACROSS THE TWO DATASETS. EACH DATASET’S BEST MODEL ISHIGHLIGHTED IN BOLD WITH THE P-VALUE FROM THE MANN-WHITNEY

WITH THE NEXT BEST MODEL.

Model MovieLens MovieTweetings SV D++ 1.520 0.969 SB�SV D++ 1.517 0.963 S�SV D++ 1.513 (< 0.001) 0.963 (< 0.1)

without its own limitations, the first of which we identifiedas the hipster dilemma. A key challenge in using semanticcategories is the mapping of items from recommendationdatasets to Uniform Resource Identifiers (URIs) on the webof linked data. Therefore techniques are required that canaccurately disambiguate the correct URI amongst a collectionof candidate URIs for a given item. The second challengethat arises from the use of linked data is its recency. In thispaper we have concentrated on the use of DBPedia as a solelinked data dataset, however the extensibility of our approachmeans that other, alternative linked data sources could beconsidered such as Freebase and Yago. This may go someway to overcoming the lack of URIs for the MovieTweetingsdataset where more recent, and obscure, movies failed to bemapped.

The work presented within this paper is currently beingextended to address the cold-start problem, as in this workwe only considered users who had rated items within boththe training and test segments. The advantage of modellingthe semantic evolution of users’ tastes, and capturing thisinformation as a global signal (c.f. taste evolution and itssusceptibility to global influence), is that we can interpolatethis information into cold start situations. For instance, whenwe are presented with a new user and a new item we canexamine globally how users have rated the same categories, orlateral to, as the new item in the past and how these tastes haveevolved. The identification and transfer of lateral categories,from which rating information can be gleaned, is possible dueto the graph structure of linked data, where semantic categoriescan be identified through graph vertex kernels - i.e. computingvertex (semantic category node) similarity based on the linkeddata graph’s topology. This will also enable the restriction onthe number of posts that users have made to be overcome giventhat we only considered users who had posted � 10 ratingswithin the training segment: for users who have only limitedprior ratings, we can interpolate the global taste evolution ofother users to anticipate their future preferences.

Our presented approach for choosing which hyperparam-eters to use for which model is based on a brute-force gridsearch for hyperparameter optimisation, where the same regu-larisation weight and learning rate are used for all componentsof the model. Therefore our future work will examine thesetting of component-specific hyperparameters (i.e. learningrate and regularisation weight) for each component. However,in doing so we open up problems of scalable optimisation. Onedirection that will be explored to counteract this is the use ofGaussian Processes as a means to model the region of theinput space, in this context the n-dimensional hyperparametervector (✓ 2 Rn), that will produce the greatest expectedreduction in error. In doing so, we anticipate that we canprovide component-specific tuning while reducing computationtime needed for hyperparameter optimisation.

VIII. CONCLUSIONS

Within this paper we have presented an approach to predicthe ratings of recommendation items by capturing the semantictaste evolution of users. We presented a means to align rec-ommendation items with their URIS on the web of linked dataand found that for more obscure movies we could not performsuch a mapping - thereby identifying the hipster dilemma.We modelled users’ rating affinity to the semantic categoriesof movie items, and examined how such tastes changed overtime: (i) relative to their prior tastes, and (ii) relative to globaltastes; finding that users of the two datasets (MovieLens andMovieTweetings) differed in their evolution. We then proposeda matrix factorisation model known as SemanticSV D++

building on prior work [2] that incorporated the semantic tasteevolution of users and empirically demonstrated its improvedperformance (in reducing prediction error) against the existingSV D++ model. We believe that this work demonstrates thatlinked data has the potential to overcome the factor consistencyproblem that latent factor models have encountered in the past,and therefore lays the ground for future work in this area.

REFERENCES

[1] C. Bizer, T. Heath, and T. Berners-Lee, “Linked data-the story so far,”International journal on semantic web and information systems, vol. 5,no. 3, pp. 1–22, 2009.

[2] Y. Koren, “Collaborative filtering with temporal dynamics,” Communi-cations of the ACM, vol. 53, no. 4, pp. 89–97, 2010.

[3] G. Miritello, R. Lara, M. Cebrian, and E. Moro, “Limitedcommunication capacity unveils strategies for human interaction,” Apr.2013. [Online]. Available: http://arxiv.org/abs/1304.1979

[4] C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, andC. Potts, “No country for old members: User lifecycle and linguisticchange in online communities,” in Proceedings of the World Wide WebConference, 2013.

[5] J. McAuley and J. Leskovec, “From amateurs to connoisseurs: Modelingthe evolution of user expertise through online reviews,” in Proceedingsof World Wide Web Conference, 2013.

[6] A. Passant, “Dbrec—music recommendations using dbpedia,” in TheSemantic Web–ISWC 2010. Springer, 2010, pp. 209–224.

[7] T. Di Noia, R. Mirizzi, V. C. Ostuni, D. Romito, and M. Zanker,“Linked open data to support content-based recommender systems,” inProceedings of the 8th International Conference on Semantic Systems.ACM, 2012, pp. 1–8.

[8] V. C. Ostuni, T. Di Noia, E. Di Sciascio, and R. Mirizzi, “Top-nrecommendations from implicit feedback leveraging linked open data,”in 7th ACM Conference on Recommender Systems (RecSys 2013). ACM,ACM Press, 2013.

[9] S. Dooms, T. De Pessemier, and L. Martens, “Movietweetings: a movierating dataset collected from twitter,” in Workshop on Crowdsourcingand Human Computation for Recommender Systems, CrowdRec atRecSys, vol. 13, 2013.

[10] M. Rowe, “Mining user lifecycles from online community platformsand their application to churn prediction,” in International Conferenceon Data Mining, 2013.

[11] T. Schreiber, “Measuring information transfer,” Physical review letters,vol. 85, no. 2, p. 461, 2000.

[12] L. Bottou and O. Bousquet, “The tradeoffs of large scale learning.” inNIPS, vol. 4, 2007, p. 2.

Page 13: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

Conclusions

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

12

¨  Semantic taste profiles can track users’ tastes: ¤ Overcomes the factor consistency problem ¤ Enables modelling of global taste influence ¤ SemanticSVD++ boosts recommendation performance

¨  Semantic categories are limited however: ¤ Hipster dilemma ¤ Cold-start Categories

Page 14: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 13

Cold-start Categories

dbpedia:c4!

dbpedia:c5!

5* 4* ?

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++. M Rowe. To appear in the proceedings of the International Semantic Web Conference. Trentino, Italy. (2014)

dbpedia:c1!

dbpedia:c3!dcterms:subject!

dbpedia:c2!

Unrated Categories

Page 15: SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

@mrowebot [email protected] http://www.lancaster.ac.uk/staff/rowem/

Questions? 14

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings