personalized paper recommendation based on user historical behavior

1

Personalized Paper Recommendation Based on

User Historical Behavior

Yuan WangCollege of Information Technology Science, Nankai

University, Tianjin, China kdd.nankai.edu.cn/wangy

Major contributors: Jie Liu, XingLiang Dong, Tianbi Liu, and YaLou Huang

Nankai University IIP Lab

Outline

Why need we provide personalized paper recommendation? background & our motivation Related approaches

Personalized Paper Recommendation model Overview Estimation for each part The optimization of the model

Results Data Collection Evaluation

Conclusion

Background

the rapid development of Digital Libraries （ DL） for information sharing and search

more new idea first posted on DL information overload and infromation lost：

Motivation

effective technique How to tackle the problem now？

sending a email through RSS subscription

Users’ historical reveal users’ mind(*^__^*) browsing behaviors on the site( publishing, marking a

paper as favorite, rating, making a comment, and tagging )

require user interestsexplicitly

Motivation

Focus on providing more relevance papers to researchers

Learn their personalized preference from their historical behaviors

Related approaches

Personalized Recommendation Collaborative filtering

users always prefer things their friend PHOAKS and REFERRAL Web, CiteSeer Search Amazon ， ebay ，

Douban， mainly in Commercial recommendation system

Drawbacks Cold Start ： Pseudo Users add new

score 、 clustering

Content-based filtering based on content and user similarity

Web Watcher ， LIRA ， Leticia

Related approaches

Personalized recommendation for scientific papers

Collaborative filtering Methods

citation relationship, similar to PageRank, less user preference

Learn from a recommendation system, then filtering, Input difficult to get

get preference from log, more noise Drawbacks

content information should be taken consideration. Content-based filtering

take care for messages carried with papers without cold start and data sparse problem statistical models are effective for paper

recommendation

Outline




Conclusion

Personalized Paper Recommendation model

A triple relationship for PPR

Description Di : the document set to be recommended to

the user. Dx : the document set the user is viewing U : the current user

Similarity between recommended resource and users

(Di, Dx, U)

Personalized Paper Recommendation model

Assumption users and documents draw from i.i.d

Description p(di) : Priori Probability of Paper p(dx|di) : Similarity between Papers p(uk|di) : Similarity between User and Paper

Priori Probability of Paper

evaluate the probability a document will be selected

through global website by users’ historical behaviors

more valuable, more operation A = {down, keep, visit, tag, score, comment,

collect, ……}

absolute discount smoothing

title, abstract, keywords and domain of area as documents’ feature

calculate the similarity through word segmentation

w : each word in a document tf (w; di) : the frequency in which w appears in di tf (di) : the frequency of all words in di. tf (w, D) : the frequency that w appears in all documents tf (D) : the total frequency all words appear in all documents a : a parameter used for smoothing ( 0.1 here)

Similarity between Papers

Actions on papers reveal users’ preference words in users and documents draw from i.i.d

Wk : the set of each word in user’s profile

tf (w; uk) : the frequency in which w appears in profile of uk P(w|di) : calculate as similarity between papers

Similarity between User and Paper

The optimization of the model Original model did’t consider users’

preference transitive relation

Optimize model by Matrix Transition

User A User B User C

The optimization of the model

Random Walk on graph

normalize by colume normalize by row

Outline




Conclusion

Data Collection

Train Data Website: www.paper.edu.cn User behaviors: publish, keep, download, visit,

tag, score and make comments about papers Paper ： first publish of papers from October

1, 2010 to March 1, 2011. Test Data

(U, Dx, Di, L) 638 data samples: 339 labeled with 1 and 299

with 0. 26 users and 93 papers

Evaluation Matrix

MAP Mean Average Precision show us the accuracy of models

NDCG Normalized Discounted Cumulative Gain list accuracy evaluation based NDCG@1 to NDCG@6.

Results

Personalized or not five percent increase the average improvement of NDCG is

10.2%.

Results

User preference iteration The more iterations, the sharper MAP declines the original information is lost as the number

of iterations increases

Results

optimization of model “ori” means original model, “n” means the

iteration times. ori+1 get the highest score.

Outline




Conclusion

Conclusion

Proposed a personalized recommendation model based on users’ historical behavior.

Users’ preference profile extracted from historical behavior, with the help of content from user model and paper information.

Provide recommendation service with generation model

Random walk model in original model to helping correlation transformation between users.

Appendix

MAP

M is size of recommended paper set, p (j) is the accuracy of first j recommended papers, l (j) is label information. C (di) is the total number of related papers to the viewed one di.

NDCG

r(i) refers to the relevant grade of ith paper and Zn is a normalized parameter, which assures and the values NDCG@n of top results add up to 1.

personalized paper recommendation based on user historical behavior

Documents

paper recommendationoutlinewhy

user preferencelearn

document tf w

personalized preference

users historical behaviors

word segmentation w

total frequency

partthe optimization