matrix factorization and factorization machines for ...cjlin/talks/sdm2015.pdfmatrix factorization...

54
Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer Science National Taiwan University Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 2015 Chih-Jen Lin (National Taiwan Univ.) 1 / 54

Upload: phungtram

Post on 05-May-2018

295 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix Factorization and FactorizationMachines for Recommender Systems

Chih-Jen LinDepartment of Computer Science

National Taiwan University

Talk at SDM workshop on Machine Learning Methods on

Recommender Systems, May 2, 2015Chih-Jen Lin (National Taiwan Univ.) 1 / 54

Page 2: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Outline

1 Matrix factorization

2 Factorization machines

3 Conclusions

Chih-Jen Lin (National Taiwan Univ.) 2 / 54

Page 3: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

In this talk I will briefly discuss two related topics

Fast matrix factorization (MF) in shared-memorysystems

Factorization machines (FM) for recommendersystems and classification/regression

Note that MF is a special case of FM

Chih-Jen Lin (National Taiwan Univ.) 3 / 54

Page 4: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization

Outline

1 Matrix factorizationIntroduction and issues for parallelizationOur approach in the package LIBMF

2 Factorization machines

3 Conclusions

Chih-Jen Lin (National Taiwan Univ.) 4 / 54

Page 5: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Outline

1 Matrix factorizationIntroduction and issues for parallelizationOur approach in the package LIBMF

2 Factorization machines

3 Conclusions

Chih-Jen Lin (National Taiwan Univ.) 5 / 54

Page 6: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Matrix Factorization

Matrix Factorization is an effective method forrecommender systems (e.g., Netflix Prize and KDDCup 2011)

But training is slow.

We developed a parallel MF package LIBMF forshared-memory systems

http://www.csie.ntu.edu.tw/~cjlin/libmf

Best paper award at ACM RecSys 2013

Chih-Jen Lin (National Taiwan Univ.) 6 / 54

Page 7: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Matrix Factorization (Cont’d)

For recommender systems: a group of users giveratings to some items

User Item Rating1 5 1001 10 801 13 30. . . . . . . . .u v r. . . . . . . . .

The information can be represented by a ratingmatrix R

Chih-Jen Lin (National Taiwan Univ.) 7 / 54

Page 8: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Matrix Factorization (Cont’d)

R

m × n

m

:

u

:2

1

1 2 .. v .. n

ru,v

?2,2

m, n : numbers of users and items

u, v : index for uth user and vth item

ru,v : uth user gives a rating ru,v to vth itemChih-Jen Lin (National Taiwan Univ.) 8 / 54

Page 9: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Matrix Factorization (Cont’d)

q2

R

m × n

≈ ×

PT

m × k

Q

k × n

m

:

u

:2

1

1 2 .. v .. n

ru,v

?2,2

pT1

pT2

:

pTu

:

pTm

q1 q2 .. qv .. qn

k : number of latent dimensions

ru,v = pTu qv

?2,2 = pT2 q2

Chih-Jen Lin (National Taiwan Univ.) 9 / 54

Page 10: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Matrix Factorization (Cont’d)

A non-convex optimization problem:

minP,Q

∑(u,v)∈R

((ru,v − pT

u qv)2 + λP ‖pu‖2F + λQ ‖qv‖2F)

λP and λQ are regularization parameters

SG (Stochastic Gradient) is now a popularoptimization method for MF

It loops over ratings in the training set.

Chih-Jen Lin (National Taiwan Univ.) 10 / 54

Page 11: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

Matrix Factorization (Cont’d)

SG update rule:

pu ← pu + γ (eu,vqv − λPpu) ,

qv ← qv + γ (eu,vpu − λQqv)

whereeu,v ≡ ru,v − pT

u qv

SG is inherently sequential

Chih-Jen Lin (National Taiwan Univ.) 11 / 54

Page 12: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

SG for Parallel MF

After r3,3 is selected, ratings in gray blocks cannot beupdated

r3,1 r3,2 r3,3 r3,4 r3,5 r3,6

r6,6

1 2 3 4 5 6

1

2

3

4

5

6

But r6,6 can be used

r3,1 = p3Tq1

r3,2 = p3Tq2

..

r3,6 = p3Tq6

——————

r3,3 = p3Tq3

r6,6 = p6Tq6

Chih-Jen Lin (National Taiwan Univ.) 12 / 54

Page 13: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

SG for Parallel MF (Cont’d)

We can split the matrix to blocks.Then use threads to update the blocks where ratings indifferent blocks don’t share p or q

1 2 3 4 5 6

1

2

3

4

5

6

Chih-Jen Lin (National Taiwan Univ.) 13 / 54

Page 14: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Introduction and issues for parallelization

SG for Parallel MF (Cont’d)

This concept of splitting data to independent blocksseems to work

However, there are many issues to have a rightimplementation under the given architecture

Chih-Jen Lin (National Taiwan Univ.) 14 / 54

Page 15: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Outline

1 Matrix factorizationIntroduction and issues for parallelizationOur approach in the package LIBMF

2 Factorization machines

3 Conclusions

Chih-Jen Lin (National Taiwan Univ.) 15 / 54

Page 16: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Our approach in the package LIBMF

Parallelization (Zhuang et al., 2013; Chin et al.,2015a)

Effective block splitting to avoidsynchronization timePartial random method for the order of SGupdates

Adaptive learning rate for SG updates (Chin et al.,2015b)

Details omitted due to time constraint

Chih-Jen Lin (National Taiwan Univ.) 16 / 54

Page 17: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Block Splitting and Synchronization

A naive way for T nodes is to split the matrix toT × T blocks

This is used in DSGD (Gemulla et al., 2011) fordistributed systems. The setting is reasonablebecause communication cost is the main concern

In distributed systems, it is difficult to move data ormodel

Chih-Jen Lin (National Taiwan Univ.) 17 / 54

Page 18: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Block Splitting and Synchronization(Cont’d)

• However, for shared memorysystems, synchronization is aconcern

1 2 3

1

2

3

• Block 1: 20s

• Block 2: 10s

• Block 3: 20s

We have 3 threadshi

Thread 0→10 10→201 Busy Busy2 Busy Idle3 Busy Busyok 10s wasted!!

Chih-Jen Lin (National Taiwan Univ.) 18 / 54

Page 19: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Lock-Free Scheduling

We split the matrix to enough blocks. For example, withtwo threads, we split the matrix to 4× 4 blocks

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 is the updated counter recording the number ofupdated times for each block

Chih-Jen Lin (National Taiwan Univ.) 19 / 54

Page 20: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Lock-Free Scheduling (Cont’d)

Firstly, T1 selects a block randomly For T2, it selects ablock neither green nor gray

T10

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Chih-Jen Lin (National Taiwan Univ.) 20 / 54

Page 21: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Lock-Free Scheduling (Cont’d)

For T2, it selects a block neither green nor gray randomlyFor T2, it selects a block neither green nor gray

T1

T2

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Chih-Jen Lin (National Taiwan Univ.) 21 / 54

Page 22: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Lock-Free Scheduling (Cont’d)

After T1 finishes, the counter for the corresponding blockis added by one

T2

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Chih-Jen Lin (National Taiwan Univ.) 22 / 54

Page 23: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Lock-Free Scheduling (Cont’d)

T1 can select available blocks to updateRule: select one that is least updated

T2

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Chih-Jen Lin (National Taiwan Univ.) 23 / 54

Page 24: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Lock-Free Scheduling (Cont’d)

SG: applying Lock-Free SchedulingSG**: applying DSGD-like Scheduling

0 2 4 6 8 10

0.84

0.86

0.88

0.9

RM

SE

Time(s)

SG**SG

0 100 200 300 400 500 600

22

22.5

23

23.5

24

Time(s)

RM

SE

SG**SG

MovieLens 10M Yahoo!Music

MovieLens 10M: 18.71s → 9.72s (RMSE: 0.835)

Yahoo!Music: 728.23s → 462.55s (RMSE: 21.985)

Chih-Jen Lin (National Taiwan Univ.) 24 / 54

Page 25: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Memory Discontinuity

Discontinuous memory access can dramatically increasethe training time. For SG, two possible update orders are

Update order Advantages DisadvantagesRandom Faster and stable Memory discontinuity

Sequential Memory continuity Not stable

Random Sequential

RR

Our lock-free scheduling gives randomness, but theresulting code may not be cache friendly

Chih-Jen Lin (National Taiwan Univ.) 25 / 54

Page 26: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Partial Random Method

Our solution is that for each block, access both R̂ and P̂continuously

R̂ : (one block)

= ×

P̂T

Q̂1 2

3 4

5 6

Partial: sequential in each blockRandom: random when selecting block

Chih-Jen Lin (National Taiwan Univ.) 26 / 54

Page 27: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Partial Random Method (Cont’d)

0 20 40 60 80 1000.8

0.9

1

1.1

1.2

1.3

Time(s)

RM

SE

RandomPartial Random

0 500 1000 1500 2000 2500 300020

25

30

35

40

45

Time(s)

RM

SE

RandomPartial Random

MovieLens 10M Yahoo!Music

The performance of Partial Random Method isbetter than that of Random Method

Chih-Jen Lin (National Taiwan Univ.) 27 / 54

Page 28: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Experiments

State-of-the-art methods compared

LIBPMF: a parallel coordinate descent method (Yuet al., 2012)

NOMAD: an asynchronous SG method (Yun et al.,2014)

LIBMF: earlier version of LIBMF (Zhuang et al.,2013; Chin et al., 2015a)

LIBMF++: with adaptive learning rates for SG (Chinet al., 2015c)

Chih-Jen Lin (National Taiwan Univ.) 28 / 54

Page 29: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Experiments (Cont’d)

Data Set m n #ratingsNetflix 2,649,429 17,770 99,072,112Yahoo!Music 1,000,990 624,961 252,800,275Webscope-R1 1,948,883 1,101,750 104,215,016Hugewiki 39,706 25,000,000 1,703,429,136

• Due to machine capacity, Hugewiki here is about halfof the original

• k = 100

Chih-Jen Lin (National Taiwan Univ.) 29 / 54

Page 30: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Experiments (Cont’d)

0 10 20 30 40 50

0.92

0.94

0.96

0.98

1

Time (sec.)

RM

SE

NOMADLIBPMFLIBMFLIBMF++

0 50 100 150 200

22

23

24

25

Time (sec.)

RM

SE

NOMADLIBPMFLIBMFLIBMF++

Netflix Yahoo!Music

0 50 100 150

23.5

24

24.5

25

25.5

26

Time (sec.)

RM

SE

NOMADLIBPMFLIBMFLIBMF++

0 500 1000 15000.5

0.52

0.54

0.56

0.58

0.6

Time (sec.)

RM

SE

CCD++FPSGFPSG++

Webscope-R1 HugewikiChih-Jen Lin (National Taiwan Univ.) 30 / 54

Page 31: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Matrix factorization Our approach in the package LIBMF

Non-negative Matrix Factorization (NMF)

Our method has been extended to solve NMF

minP,Q

∑(u,v)∈R

((ru,v − pT

u qv)2 + λP ‖pu‖2F + λQ ‖qv‖2F)

subject to Pi ,u ≥ 0,Qi ,v ≥ 0,∀i , u, v

Chih-Jen Lin (National Taiwan Univ.) 31 / 54

Page 32: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Outline

1 Matrix factorization

2 Factorization machines

3 Conclusions

Chih-Jen Lin (National Taiwan Univ.) 32 / 54

Page 33: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

MF and Classification/Regression

MF solves

minP,Q

∑(u,v)∈R

(ru,v − pT

u qv

)2Note that I omit the regularization term

Ratings are the only given information

This doesn’t sound like a classification or regressionproblem

In the second part of this talk we will make aconnection and introduce FM (FactorizationMachines)

Chih-Jen Lin (National Taiwan Univ.) 33 / 54

Page 34: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Handling User/Item Features

What if instead of user/item IDs we are given userand item features?

Assume user u and item v have feature vectors

fu and gv

How to use these features to build a model?

Chih-Jen Lin (National Taiwan Univ.) 34 / 54

Page 35: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Handling User/Item Features (Cont’d)

We can consider a regression problem where datainstances are

value features...

...ruv

[fTu gT

v

]...

...

and solve

minw

∑u,v∈R

(Ru,v −wT

[fugv

])2

Chih-Jen Lin (National Taiwan Univ.) 35 / 54

Page 36: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Feature Combinations

However, this does not take the interaction betweenusers and items into account

Note that we are approximating the rating ru,v ofuser u and item v

LetU ≡ number of user featuresV ≡ number of item features

Then

fu ∈ RU , u = 1, . . . ,m,

gv ∈ RV , v = 1, . . . , n

Chih-Jen Lin (National Taiwan Univ.) 36 / 54

Page 37: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Feature Combinations (Cont’d)

Following the concept of degree-2 polynomialmappings in SVM, we can generate new features

(fu)t(gv)s , t = 1, . . . ,U , s = 1, . . .V

and solve

minwt,s ,∀t,s

∑u,v∈R

(ru,v −U∑

t ′=1

V∑s ′=1

wt ′,s ′(fu)t(gv)s)2

Chih-Jen Lin (National Taiwan Univ.) 37 / 54

Page 38: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Feature Combinations (Cont’d)

This is equivalent to

minW

∑u,v∈R

(ru,v − fTu Wgv)2,

whereW ∈ RU×V is a matrix

If we have vec(W ) by concatenating W ’s columns,another form is

minW

∑u,v∈R

ru,v − vec(W )T

...(fu)t(gv)s

...

2

,

Chih-Jen Lin (National Taiwan Univ.) 38 / 54

Page 39: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Feature Combinations (Cont’d)

However, this setting fails for extremely sparsefeatures

Consider the most extreme situation. Assume wehave

user ID and item ID

as features

Then

U = m, J = n,

fi = [0, . . . , 0︸ ︷︷ ︸i−1

, 1, 0, . . . , 0]T

Chih-Jen Lin (National Taiwan Univ.) 39 / 54

Page 40: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Feature Combinations (Cont’d)

The optimal solution is

Wu,v =

{ru,v , if u, v ∈ R

0, if u, v /∈ R

We can never predict

ru,v , u, v /∈ R

Chih-Jen Lin (National Taiwan Univ.) 40 / 54

Page 41: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Factorization Machines

The reason why we cannot predict unseen data isbecause in the optimization problem

# variables = mn� # instances = |R |

Overfitting occurs

Remedy: we can let

W ≈ PTQ,

where P and Q are low-rank matrices. Thisbecomes matrix factorization

Chih-Jen Lin (National Taiwan Univ.) 41 / 54

Page 42: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Factorization Machines (Cont’d)

This can be generalized to sparse user and itemfeatures

minu,v∈R

(Ru,v − fTu PTQgv)2

That is, we think

Pfu and Qgv

are latent representations of user u and item v ,respectively

This becomes factorization machines (Rendle, 2010)

Chih-Jen Lin (National Taiwan Univ.) 42 / 54

Page 43: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Factorization Machines (Cont’d)

Similar ideas have been used in other places such asStern, Herbrich, and Graepel (2009)

In summary, we connect MF andclassification/regression by the following settings

We need combination of different feature types(e.g., user, item, etc)However, overfitting occurs if features are verysparseWe use product of low-rank matrices to avoidoverfitting

Chih-Jen Lin (National Taiwan Univ.) 43 / 54

Page 44: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Factorization Machines (Cont’d)

We see that such ideas can be used for not onlyrecommender systems.

They may be useful for any classification problemswith very sparse features

Chih-Jen Lin (National Taiwan Univ.) 44 / 54

Page 45: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Field-aware Factorization Machines

We have seen that FM is useful to handle highlysparse features such as user IDs

What if we have more than two ID fields?

For example, in CTR prediction for computationaladvertising, we may have

value features...

...CTR user ID, Ad ID, site ID

......

Chih-Jen Lin (National Taiwan Univ.) 45 / 54

Page 46: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Field-aware Factorization Machines(Cont’d)

FM can be generalized to handle differentinteractions between fields

Two latent matrices for user ID and Ad IDTwo latent matrices for user ID and site ID...

This becomes FFM: field-aware factorizationmachines (Rendle and Schmidt-Thieme, 2010)

Chih-Jen Lin (National Taiwan Univ.) 46 / 54

Page 47: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

FFM for CTR Prediction

It’s used by Jahrer et al. (2012) to win the 2nd prizeof KDD Cup 2012

Recently my students used FFM to win two Kagglecompetitions

After we used FFM to win the first, in the secondcompetition all top teams use FFM

Note that for CTR prediction, logistic rather thansquared loss is used

Chih-Jen Lin (National Taiwan Univ.) 47 / 54

Page 48: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Discussion

How to decide which field interactions to use?

If features are not extremely sparse, can the resultstill be better than degree-2 polynomial mappings?

Note that we lose the convexity here

We have a software LIBFFM for public use

http://www.csie.ntu.edu.tw/~cjlin/libffm

Chih-Jen Lin (National Taiwan Univ.) 48 / 54

Page 49: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Experiments

We see thatW ⇒ PTQ

reduces the number of variables

What if we map ...(fu)t(gv)s

...

⇒ a shorter vector

to reduce the number of features/variables

Chih-Jen Lin (National Taiwan Univ.) 49 / 54

Page 50: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Experiments (Cont’d)

However, we may have something like

(r1,2 −W1,2)2 ⇒ (r1,2 − w̄1)2 (1)

(r1,4 −W1,4)2 ⇒ (r1,4 − w̄2)2

(r2,1 −W2,1)2 ⇒ (r2,1 − w̄3)2

(r2,3 −W2,3)2 ⇒ (r2,3 − w̄1)2 (2)

Clearly, there is no reason why (1) and (2) shouldshare the same variable w̄1

In contrast, in MF, we connect r1,2 and r1,3 throughp1

Chih-Jen Lin (National Taiwan Univ.) 50 / 54

Page 51: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Factorization machines

Experiments (Cont’d)

A simple comparison on MovieLens

# training: 9,301,274, # test: 698,780, # users:71,567, # items: 65,133

Results of MF: RMSE = 0.836

Results of Poly-2 + Hashing:

RMSE = 1.14568 (106 bins), 3.62299 (108 bins),3.76699 (all pairs)

We can clearly see that MF is much better

Chih-Jen Lin (National Taiwan Univ.) 51 / 54

Page 52: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Conclusions

Outline

1 Matrix factorization

2 Factorization machines

3 Conclusions

Chih-Jen Lin (National Taiwan Univ.) 52 / 54

Page 53: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Conclusions

Conclusions

In this talk we have talked about MF and FFM

MF is a mature technique, so we investigate its fasttraining

FFM is relatively new. We introduce its basicconcepts and practical use

Chih-Jen Lin (National Taiwan Univ.) 53 / 54

Page 54: Matrix Factorization and Factorization Machines for ...cjlin/talks/sdm2015.pdfMatrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer

Conclusions

Acknowledgments

The following students have contributed to worksmentioned in this talk

Wei-Sheng Chin

Yu-Chin Juan

Bo-Wen Yuan

Yong Zhuang

Chih-Jen Lin (National Taiwan Univ.) 54 / 54