Download - [논문발표] 20160801 A Sentiment-Enhanced Personalized Location Recommendation System

A Sentiment-Enhanced Per-sonalized Location Recom-mendation SystemDingqi Yang, 24th ACM Conference on Hypertext and Social Media, 2013

2016.08.01

KAIST iDBLab

윤상훈

이 문서는 나눔글꼴로 작성되었습니다 . 설치하기

http://hangeul.naver.com/font

1. Abstract

2. Introduction

3. User Preference Model

4. Location Based Social Matrix Factorization

Model

5. Experimental Analysis

목차

1.1 페이지 제목

Abstract3 / 14

•장소 기반 소셜 네트워크에서 사용자들은 특정 장소에 체크인을 하거나 팁을 남길 수 있다 .•현재까지의 연구에서는 사용자들의 체크인에만 집중을 했고 팁에 대해서는 거의

연구되지 않았다 .•현재의 연구는 social influence 를 주로 고려했지만 , 장소 유사도를 이용해서 추천

성능을 높일 수 있다는 것을 주장한다 .•제안

–Sentiment analysis 를 한 팁과 체크인 데이터를 조합한 user-location preference model–User social influence 와 venue similarity 를 고려한 matrix factorization algorithm 을 통한 location

recommendation


User Preference Model4 / 14

Tips data processing flow• Input: Raw tips• Output: Noun phrases with sentiment score

1. 언어 감지 ( 영어만 )2. 문장으로 쪼개고 , 각 단어에 품사 태깅을 한다3. 각 단어를 SentiWordNet 에서 찾음으로써 sentiment score 를 얻는다4. Noun phrase chunking (e.g. good + place = good place)

• 팁의 sentiment score 는 각 phrase 의 sentiment score 를 합해서 [-1, 1] 로 normalization 을 한다

• 구현은 NTLK, SentiWordNet3.0 기반

http://www.nltk.org/

http://sentiwordnet.isti.cnr.it/


User Preference Model5 / 14

Preference extraction• Power law distribution 때문에 왼쪽과

같은 mapping• Sentiment score 의 분포를 고려해서

왼쪽과 같은 mapping

Fusion• 한 번의 체크인은 사용자의 감정에 대한

충분한 정보를 준다고 보기 어려우므로 sentiment preference 를 사용

• H(x): Heaviside step function(unit step function)

# of check-ins

Check-in preference matrix element

1 2

2 3

3 4

4+ 5

Sentiment score

Preference measure

[-1, -0.05] 1

(-0.05, -0.01] 2

(-0.01, 0.01) 3

[0.01, 0.05) 4

[0.05, 1] 5


Location Based Social Matrix Factoriza-tion Model

6 / 14

Matrix Factorization•Probabilistic matrix factorization(PMF)

–User-item rating matrix 를 (user-latent space matrix) * (item-latent space matrix) 로 factorize 한다 .

–위의 식을 maximizing 함으로써 U, V 를 얻어 recommendation 을 위한 R 을 만들 수 있다 .

–: user 가 item 를 평가했을 때만을 고려하기 위한 function

•는 mean , variance 인 normal distribution


Location Based Social Matrix Factoriza-tion Model

7 / 14

Location Based Social MF•Probabilistic matrix factorization(PMF)

•Gradient descent•자세한 것은 논문 참고


Experimental Analysis8 / 14

Dataset Description•4 개월 동안의 Foursquare 체크인 데이터 (2011 년 10 월 24 일 ~ 2012 년 2 월

20 일 )•Noise 와 invalid 한 체크인 데이터 필터링

–한 주에 적어도 한 개의 체크인을 한 사용자만을 고름 (active user 로 간주 )–Sudden-move(1200km/h 보다 빠른 연속적인 체크인 ) 제외–카테고리 정보가 unavailable 한 장소 제외

•762,315 명의 사용자 , 31,820,144 개의 체크인 •필터링 후 311,475 명의 사용자 , 21,920,144 개의 체크인•뉴욕과 런던만 ( 영어를 주로 사용하기 때문에 )•트위터에서 맞팔하는 경우에 친구 사이로 간주•9 개의 parent category, 400 개의 sub-category merged into 274 sub-category



Social and Inter-venue Influence Modeling•Social influence

–Similarity 는 사용자들의 preference vector 를 이용해서 계산 (Pearson Correlation Coefficient)•Inter-venue influence

–Venue 의 카테고리 정보에서 0/1 based venue similarity network 를 생성–같은 sub-category 를 포함하면 similarity score 가 1–뉴욕 레스토랑의 venue similarity network 의 density 는 0.0353–런던은 0.0339

Metrics•Mean Absolute Error (MAE)•Root Mean Square Error (RMSE)



Hybrid Preference Model Evaluation

아래 3 개의 모델을 비교•Basic model (BM): check-in preference matrix 만을 사용•Tip null model (TNM)

–Sentiment preference matrix 를 랜덤하게 섞고 check-in preference matrix 와 fuse–Preference model 의 분포를 유지한다

•Hybrid preference model (HPM): hybrid preference matrix 를 사용

•Variance 와 learning rate 는 고정•Training/test split 을 80%, 90% 로 나눠서 테스트•Latent space dimension 은 10•5 번 반복해서 평균

Dataset Train-ing

Met-ric BM TNM HPM

New York Restaurant

90% RMSE 1.0137 0.8887 0.8524 MAE 0.8072 0.7032 0.6204

80% RMSE 1.0386 1.0506 0.9580 MAE 0.8103 0.8306 0.7345

London 90%

RMSE 1.1045 0.9864 0.8929 MAE 0.9031 0.7889 0.7022

80% RMSE 1.1245 1.0895 1.0119 MAE 0.9147 0.8828 0.8075



Location Recommendation Evaluation

아래 4 개의 모델과 LBSMF 를 비교•Collaborative filtering (CF)•Probabilistic matrix factorization (PMF)•SocialMF

–Social network influence 를 고려–친구의 impact 를 모두 동등하게 취급

•Social Regularized MF (SRMF)–Social network influence 를 고려–Similarity measure 도 고려

•Latent space dimension 은 5, 10•방금과 나머지 변수들은 같음



Location Recommendation Evaluation

Dataset Train-ing Metric

Dimension = 5 Dimension = 10

CF PMF SocialMF SRMF LBSMF CF PMF SocialMF SRMF LBSMF

New York Restaurant

90%

RMSE Improve

1.2463 26.31%

0.9440 2.71%

0.9364 1.92%

0.9342 1.69% 0.9184 1.2463

31.61% 0.9136 6.70%

0.8889 4.11%

0.8755 2.64% 0.8524

MAE Improve

0.7190 3.35%

0.7182 3.24%

0.7074 1.77%

0.7034 1.21% 0.6949 0.7190

13.71% 0.7047 11.96%

0.6429 3.50%

0.6238 0.55% 0.6204

80%

RMSE Improve

1.4887 32.56%

1.0209 1.66%

1.0279 2.33%

1.0206 1.63% 1.0040 1.4887

35.65% 0.9942 3.64%

0.9748 1.72%

0.9713 1.37% 0.9580

MAE Improve

0.8435 6.15%

0.8262 4.19%

0.8204 3.51%

0.7959 0.54% 0.7916 0.8435

12.92% 0.8101 9.33%

0.7585 3.16%

0.7425 1.08% 0.7345

London

90%

RMSE Improve

1.3787 32.34%

0.9758 4.41%

0.9651 3.35%

0.9519 2.01% 0.9328 1.3787

35.24% 0.9763 8.54%

0.9125 2.15%

0.9382 4.83% 0.8929

MAE Improve

0.8687 15.79%

0.7719 5.23%

0.7682 4.78%

0.7568 3.34% 0.7315 0.8687

19.17% 0.7882

10.91% 0.7203 2.51%

0.7379 4.84% 0.7022

80%

RMSE Improve

1.6222 36.67%

1.07334.29%

1.0497 2.13%

1.0547 2.60% 1.0273 1.6222

37.62% 1.0496 3.59%

1.0358 2.31%

1.0440 3.07% 1.0119

MAE Improve

1.0441 20.83%

0.8682 4.79%

0.8539 3.20%

0.8520 2.98% 0.8266 1.0441

22.66% 0.8508 5.09%

0.8246 2.07%

0.8441 4.34% 0.8075


Comments13 / 14

•tip 에서 venue semantic similarity 찾는 future work 가 궁금•Latent space dimension 을 결정하기 위한 cross-validation 이 이루어지지 않음•Five repeated trial 이 서로 다른 test-training split 을 의미하는 걸까 ?•왜 Pearson Correlation Coefficient 를 썼을까 ?

Q&A



감사합니다



Download - [논문발표] 20160801 A Sentiment-Enhanced Personalized Location Recommendation System

Top Related