deviation-based contextual slim recommenders...我爱ppt中文网整理 deviation-based contextual...

我爱PPT中文网整理 www.iloveppt.org

Deviation-Based Contextual SLIM Recommenders

Yong Zheng, Bamshad Mobasher, Robin Burke

DePaul University, Chicago, IL, USA

@CIKM 2014, Shanghai, China, Nov 4, 2014

http://iloveppt.org/


Outline of the Talk

• Context-aware Recommender Systems (CARS)

• Collaborative Filtering and SLIM Recommenders

• CSLIM: Contextualizing SLIM Recommenders

• Experimental Evaluations

• Conclusions and Future Work



Traditional Recommender Systems (RS)

T1 T2 T3 T4 T5

U1 3 2

U2 3 3 4

U3 4 2 1

U4 2 5 5

U5 3 2 4 2

Example: User-Item 2D-Rating Matrix

Traditional Recommender: Users × Items Ratings



Context-aware RS (CARS)

Motivations behind: Recommendation cannot live

alone without considering contexts, because users’

preferences always change from contexts to contexts.

Companion




Example: User-Item Contextual Rating Matrix

In CARS: Users × Items × Contexts Ratings




Example: User-Item Contextual Rating Matrix

Terminology:

Context dimension: time, location, companionContext condition: values in specific dimension, e.g.,weekend and weekday are two conditions in the context dimension “Time”




Representational CARS (R-CARS):

Assuming there are known influential contextual

variables available (e.g., location, time, mood, etc),

how to build CARS algorithms to adapt to users’

preferences in different contextual situations.




Most of research in R-CARS is focusing on

development of context-aware collaborative filtering

(CACF).

CF CACFContexts



Outline of the Talk








Collaborative Filtering (CF)

CF is one of most popular recommendation algorithms.

1). Memory-based CF

Such as user-based CF and item-based CF

Pros: good for explanation; Cons: sparsity problems

2). Model-based CF

Such as matrix factorization, etc

Pros: good performance; Cons: cold-start, explanation

3).Hybrid CF Recommendation Algorithms

Such as content-based hybrid CF, etc

Pros: further improvement; Cons: running costs



Item-based CF (ItemKNN, Sarwar, 2001)

T1 T2 T3 T4 T5

U1 3 2

U2 3 3 ??? 4

U3 4 2 1

U4 2 5 5

U5 3 2 4 2

𝑃𝑢,𝑖 =

𝑗∈𝑁𝑖

𝑅𝑢,𝑗 × 𝑠𝑖𝑚(𝑖, 𝑗

𝑗∈𝑁𝑖

𝑠𝑖𝑚(𝑖, 𝑗Rating Prediction:

Cons: item-item similarity calculations and neighborhood selections rely on co-ratings.

What if the # of co-ratings is limited?



SLIM (Ning, et al., 2011)

Sparse Linear Model (SLIM) is considered as another shape of collaborative filtering approach.

Ranking Score Prediction:

Matrix R = User-Item rating matrix; Matrix W = Item-Item coefficient matrix ≈ similarity matrix

We name this approach as SLIM-I, since W represents item-item coefficients.

𝑆𝑖,𝑗 = 𝑅𝑖,: ⋅ 𝑊:,𝑗 =

ℎ=1,ℎ≠𝑗

𝑁

𝑅𝑖,ℎ𝑊ℎ,𝑗



Comparison Between ItemKNN & SLIM-I

Pros of SLIM-I:Matrix W is learned directly towards prediction/ranking error; in other words, item-item coefficient/similarity is no longer calculated based on co-ratings, which is more reliable and can be optimized towards ranking directly.

SLIM-I has been demonstrated to outperform UserKNN, ItemKNN, matrix factorization and other traditional RS algorithms.

𝑆𝑖,𝑗 = 𝑅𝑖,: ⋅ 𝑊:,𝑗 =

ℎ=1,ℎ≠𝑗

𝑁


𝑃𝑢,𝑖 =

𝑗∈𝑁𝑖

𝑅𝑢,𝑗 × 𝑠𝑖𝑚(𝑖, 𝑗

𝑗∈𝑁𝑖

𝑠𝑖𝑚(𝑖, 𝑗Rating Prediction in ItemKNN:

Ranking Score Prediction in SLIM-I:



SLIM-I and SLIM-U

SLIM-I is another shape of ItemKNN; W = Item-item coefficient matrix;SLIM-U is another shape of UserKNN; W = User-user coefficient matrix;



Outline of the Talk








CSLIM: Contextual SLIM Recommenders

We use SLIM-I as an example to introduce how to build CSLIM-I approaches; contexts can also be incorporated into SLIM-U to formulate CSLIM-U models accordingly.

Ranking Prediction in SLIM-I:

CSLIM has a uniform ranking prediction:

CSLIM aggregates contextual ratings with item-item coefficients.

There are two key points:1).The rating to be aggregated should be placed under same c;2).Accordingly, W indicates coefficients under same contexts;

𝑆𝑖,𝑗 = 𝑅𝑖,: ⋅ 𝑊:,𝑗 =

ℎ=1,ℎ≠𝑗

𝑁


𝑆𝑖,𝑗,𝑐 =

ℎ=1,ℎ≠𝑗

𝑁

𝑅𝑖,ℎ,𝑐𝑊ℎ,𝑗

Incorporate Contexts




The challenge is how to estimate , since contextual ratings are usually sparse – it is not guaranteed that the same user already rated other items in the same context c.

Ranking Prediction in CSLIM-I:

We used a deviation-based approach to estimate it.

Matrix R: user-item 2D rating matrix (non-contextual ratings)Matrix W: item-item coefficient matrixMatrix D: a matrix estimating rating deviations in contexts;

Here, D is a CI matrix (rows are items, cols are contexts)This approach is named as CSLIM-I-CI

𝑆𝑖,𝑗,𝑐 =

ℎ=1,ℎ≠𝑗

𝑁

𝑅𝑖,ℎ,𝑐𝑊ℎ,𝑗

𝑅𝑖,ℎ,𝑐




We used a deviation-based approach to estimate it.

Example: CSLIM-I-CI,

R = non-contextual Rating MatrixD = Contextual Rating Deviation MatrixW = Item-item Coefficient MatrixC = a binary context vector, as below

𝑅𝑖,𝑗,𝑐 = 𝑅𝑖,𝑗 +

𝑙=1

𝐿

𝐷𝑗,𝑙𝑐𝑙

Weekend Weekday At Home At Park

1 0 0 1

We use this estimation even if we already know a real contextual rating in situation c, since we’d like to learn as many cells in D as possible.




There are three ways to model contextual rating deviation (CRD) in D:

1). D is a CI matrix – assuming there is CRD for each <item, context> pair2). D is a CU matrix – assuming there is CRD for each <user, context> pair3). D is a vector – assuming CRD is only dependent with context

Incorporate contexts into SLIM-I: CSLIM-I-CI, CSLIM-I-CU, CSLIM-I-C;Incorporate contexts into SLIM-U: CSLIM-U-CI, CSLIM-U-CU, CSLIM-U-C;

We have built six Deviation-based CSLIM models!!



Further Step: General CSLIM Approaches

Cons: CSLIM requires users’ non-contextual ratings on items; if there are no such ratings, we proposed to use the average of user’s contextual ratings on the item for representative, which wasdemonstrated to be feasible in our experiments.

However, we’d like to build more General CSLIM (GSLIM) models which does not require the data of non-contextual ratings.

Simply, we model matrix D as a CC matrix, where each cell in D represents the CRD between each two contextual conditions.GCSLIM-I-CC can estimate rating deviations from a contextual ratingto another contextual rating (same item but different contexts).




For example, we want to estimate R<u1, t1, {Weekday, At home}>And we already know the rating R<u1, t1, {Weekend, At cinema}>And Matrix D helps us to learn and estimateCRD (Weekday, Weekend) & CRD (At home, At cinema)

Therefore, R<u1, t1, {Weekday, At home}> = R<u1, t1, {Weekend, At cinema}> + CRD (Weekday, Weekend) + CRD (At home, At cinema)

Similarly, matrix D can be paired with users or items; e.g., we assume CRD between contexts differ from users to users.




Two challenges in GCSLIM approaches:

1). For each <user, item> pair, there could be several ratings forthis pair but in different contexts. Which contextual rating shouldbe applied?

If we use all those ratings increasing computational costs;If we just select one of them there are three ways: MostSimilar,LeastSimilar and Random; our experiments showed we could randomly pick up one. See our papers for more details.




Two challenges in GCSLIM approaches:

2). How to couple matrix D with user or item dimension

If assign a D for each user/item increasing computation costs

Solution: we can cluster users/items to small groups, and assumethe users/items in the same group can share a same matrix D.

We will explore this attempt in our future work.



Outline of the Talk








Data Sets

The current situation in the CARS research domain:1). The number of data sets is limited;2). The data is either small or sparse;3). There are no large data sets, or larger ones are not publiclyaccessible. Most data were collected from surveys.

All the data sets used can be found here: http://tiny.cc/contextdata

For reason of limited time, we only present results based on therestaurant and music data in this slide. See more results in our CIKM paper.


http://tiny.cc/contextdata


Baseline Approaches

We choose the state-of-the-arts CACF algorithms as baselines:

1). Differential context modeling (DCM): DCM incorporates contextsinto UserKNN/ItemKNN, but it suffers from sparsity problem and performs the worst in terms of precision, recall and MAP.

2). Context-aware Splitting Approaches (CASA): CASA is a contextualtransformation approach, where contextual data were converted to2D user-item rating matrix, and then traditional approach (MF inthis case) can be applied to the transformed data.

3). Context-aware Matrix Factorization (CAMF): CAMF incorporatescontexts into MF, where CRD is modeled as similar way as CSLIM.

4). Tensor Factorization (TF): TF is an independent context-awarealgorithm, since contexts are assumed to be independent with user and item dimensions. TF increases computational costs with the number of contexts increases.



Evaluation Protocols

1). 5-folds Cross-validationAll algorithms were run based on the same 5-folds of the data.

2). Top-N Recommendation EvaluationsMetrics: Precision, Recall and MAP (Mean Average Precision)Precision and Recall are used to measure accuracy;MAP is used to measure the position in the rankings;

Research Questions:1). CSLIM outperforms the state-of-the-art CARS algorithms?2). How about the GCSLIM? Better than CSLIM?3). There are so many CLSIM algorithms, any guidelines to pre-select the appropriate CSLIM algorithm?



Evaluation Results




Evaluation Results


There are two pieces in CSLIM algorithms; For example, CSLIM-I-CI1). CSLIM-I, indicates we perform an ItemKNN CF approach;2). – CI, indicates we model CRD as a CI matrix;

Questions:1). CSLIM-I/ItemKNN or CSLIM-U/UserKNN should be used?AW: it depends on the average number of ratings on items orthe average number of ratings by users.2). –CI, –CU or –C should be applied?AW: it relies on contexts are more dependent with users or items

For more details, see our CIKM paper.



Evaluation Results

How about the running efficiency?Typically, in CSLIM and GCSLIM, the matrices D and W should be learned in the process. There could be different challenges:

1). Large number of users/items/ratingsIn this case, the non-contextual rating matrix R or the rating space P will be very large, as well as the matrix W.Solution: adopt KNN strategy. We do not use all the ratings, but only select the top-N neighbors (items or users).

2). Large scale of contextsWhat if there are tons of contextual conditions? Usually, in CARS domain, the # of contextual dimensions are within 10, and the # of contextual conditions are 100 at most.Solution: there are many ways to pre-select influential contexts, which contributes to reduce the # of contexts.



Outline of the Talk








Conclusions

1). CSLIM actually has been demonstrated to outperform the state-of-the-art CARS algorithms;2). GCSLIM sometimes contributes further improvements, but it is not guaranteed that GCSLIM can always beat CSLIM algorithms – it depends on how sparse the contextual ratings are;3). We figure out some influential factors and discover latent rules to select the appropriate CSLIM algorithms in advance.

1). Try to examine CSLIM and GCSLIM on larger data sets;2). Try to compete with more models, e.g. factorization machines;3). Try to couple CC matrix with users/items in GCSLIM approach;4). Try to incorporate contexts into matrix W instead of adding thematrix D.

Future Work



Deviation-Based Contextual SLIM Recommenders

Yong Zheng, Bamshad Mobasher, Robin Burke

DePaul University, Chicago, IL, USA

@CIKM 2014, Shanghai, China, Nov 4, 2014

Thanks!

Questions?


deviation-based contextual slim recommenders...我爱ppt中文网整理 deviation-based contextual...

Documents