Restricted Boltzmann Machines for CollaborativeFiltering
Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton
November 29, 2016
Binglin Chen RBM for Collaborative Filtering November 29, 2016 1 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similar
If two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated item
Properties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)
NetflixSpotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)Netflix
Spotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
Collaborative Filtering
Wikipedia:
In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).
Fundamental ideas:
If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown
Applications:
Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify
Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22
The Task: Netflix Prize
Basic statistics:
480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>
Training set:
100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5
Qualifying set:
2,817,131 ratings, <CustomerID, MovieID, Date>
Sparsity of ratings:100, 480, 507 + 2, 817, 131
480, 189× 17, 770= 0.0121 = 1.21%
Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22
The Task: Netflix Prize
Basic statistics:
480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>
Training set:
100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5
Qualifying set:
2,817,131 ratings, <CustomerID, MovieID, Date>
Sparsity of ratings:100, 480, 507 + 2, 817, 131
480, 189× 17, 770= 0.0121 = 1.21%
Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22
The Task: Netflix Prize
Basic statistics:
480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>
Training set:
100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5
Qualifying set:
2,817,131 ratings, <CustomerID, MovieID, Date>
Sparsity of ratings:100, 480, 507 + 2, 817, 131
480, 189× 17, 770= 0.0121 = 1.21%
Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22
The Task: Netflix Prize
Basic statistics:
480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>
Training set:
100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5
Qualifying set:
2,817,131 ratings, <CustomerID, MovieID, Date>
Sparsity of ratings:100, 480, 507 + 2, 817, 131
480, 189× 17, 770= 0.0121 = 1.21%
Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22
Binary RBM
v1 v2 v3 vN
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}N
h ∈ {0, 1}M
W1,1 WM,N W ∈ RM×N
b1 b2 b3 bN
a1 a2 a3 aM
b ∈ RN
a ∈ RM
E(h,v) = −aTh− bTv − hTWv
p(h,v) =1
Zexp(−E(h,v))
Z =∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22
Binary RBM
v1 v2 v3 vN
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}N
h ∈ {0, 1}M
W1,1 WM,N W ∈ RM×N
b1 b2 b3 bN
a1 a2 a3 aM
b ∈ RN
a ∈ RM
E(h,v) = −aTh− bTv − hTWv
p(h,v) =1
Zexp(−E(h,v))
Z =∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22
Binary RBM
v1 v2 v3 vN
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}N
h ∈ {0, 1}M
W1,1 WM,N W ∈ RM×N
b1 b2 b3 bN
a1 a2 a3 aM
b ∈ RN
a ∈ RM
E(h,v) = −aTh− bTv − hTWv
p(h,v) =1
Zexp(−E(h,v))
Z =∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22
Binary RBM
v1 v2 v3 vN
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}N
h ∈ {0, 1}M
W1,1 WM,N W ∈ RM×N
b1 b2 b3 bN
a1 a2 a3 aM
b ∈ RN
a ∈ RM
E(h,v) = −aTh− bTv − hTWv
p(h,v) =1
Zexp(−E(h,v))
Z =∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22
Binary RBM
v1 v2 v3 vN
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}N
h ∈ {0, 1}M
W1,1 WM,N W ∈ RM×N
b1 b2 b3 bN
a1 a2 a3 aM
b ∈ RN
a ∈ RM
E(h,v) = −aTh− bTv − hTWv
p(h,v) =1
Zexp(−E(h,v))
Z =∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22
Binary RBM
v1 v2 v3 vN
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}N
h ∈ {0, 1}M
W1,1 WM,N W ∈ RM×N
b1 b2 b3 bN
a1 a2 a3 aM
b ∈ RN
a ∈ RM
E(h,v) = −aTh− bTv − hTWv
p(h,v) =1
Zexp(−E(h,v))
Z =∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22
Binary RBM, continued
p(h|v)
=p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = p(h,v)
p(v)
=1
p(v)
1
Zexp(aTh+ bTv + hTWv)
=1
p(v)
1
Zexp(bTv) exp(aTh+ hTWv)
=1
Z ′exp(aTh+ hTWv)
=1
Z ′exp
(M∑i=1
aihi +
M∑i=1
hiW i,:v
)
=1
Z ′exp
(M∑i=1
hi(ai +W i,:v)
)
=1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22
Binary RBM, continued
p(h|v) = 1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
p(hi|v) ∝ exp (hi(ai +W i,:v))
p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)
=exp (ai +W i,:v)
1 + exp (ai +W i,:v)
= sigmoid (ai +W i,:v)
p(vj = 1|h) = sigmoid(bj + hTW :,j
)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22
Binary RBM, continued
p(h|v) = 1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
p(hi|v) ∝ exp (hi(ai +W i,:v))
p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)
=exp (ai +W i,:v)
1 + exp (ai +W i,:v)
= sigmoid (ai +W i,:v)
p(vj = 1|h) = sigmoid(bj + hTW :,j
)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22
Binary RBM, continued
p(h|v) = 1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
p(hi|v) ∝ exp (hi(ai +W i,:v))
p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)
=exp (ai +W i,:v)
1 + exp (ai +W i,:v)
= sigmoid (ai +W i,:v)
p(vj = 1|h) = sigmoid(bj + hTW :,j
)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22
Binary RBM, continued
p(h|v) = 1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
p(hi|v) ∝ exp (hi(ai +W i,:v))
p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)
=exp (ai +W i,:v)
1 + exp (ai +W i,:v)
= sigmoid (ai +W i,:v)
p(vj = 1|h) = sigmoid(bj + hTW :,j
)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22
Binary RBM, continued
p(h|v) = 1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
p(hi|v) ∝ exp (hi(ai +W i,:v))
p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)
=exp (ai +W i,:v)
1 + exp (ai +W i,:v)
= sigmoid (ai +W i,:v)
p(vj = 1|h) = sigmoid(bj + hTW :,j
)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22
Binary RBM, continued
p(h|v) = 1
Z ′
M∏i=1
exp (hi(ai +W i,:v))
p(hi|v) ∝ exp (hi(ai +W i,:v))
p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)
=exp (ai +W i,:v)
1 + exp (ai +W i,:v)
= sigmoid (ai +W i,:v)
p(vj = 1|h) = sigmoid(bj + hTW :,j
)Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22
Binary RBM Learning
Binary RBM’s parameters: θ = {W ,a, b}
Maximum likelihood principle: argmaxθ p(v(1),v(2), . . . ,v(T )|θ)
No closed-form solution
Resort to gradient ascent
Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22
Binary RBM Learning
Binary RBM’s parameters: θ = {W ,a, b}Maximum likelihood principle: argmaxθ p(v
(1),v(2), . . . ,v(T )|θ)
No closed-form solution
Resort to gradient ascent
Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22
Binary RBM Learning
Binary RBM’s parameters: θ = {W ,a, b}Maximum likelihood principle: argmaxθ p(v
(1),v(2), . . . ,v(T )|θ)No closed-form solution
Resort to gradient ascent
Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22
Binary RBM Learning
Binary RBM’s parameters: θ = {W ,a, b}Maximum likelihood principle: argmaxθ p(v
(1),v(2), . . . ,v(T )|θ)No closed-form solution
Resort to gradient ascent
Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22
Binary RBM Learning, continued
`(θ)
=
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)=
T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)=
T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)=
T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=
T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)
=T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=
T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)=
T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=
T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)=
T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log p(v(t))
=
T∑t=1
log∑h
p(h,v(t))
=
T∑t=1
log∑h
1
Zexp
(−E(h,v(t))
)=
T∑t=1
log1
Z
∑h
exp(−E(h,v(t))
)=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T logZ
=
T∑t=1
log∑h
exp(−E(h,v(t))
)− T log
∑h,v
exp(−E(h,v))
Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β
=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]
Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Learning, continued
`(θ) =
T∑t=1
log∑h
exp(−E(h,v
(t)))− T log
∑h,v
exp(−E(h,v))
∂`(θ)
∂β=
T∑t=1
∂ log∑
h exp(−E(h,v(t))
)∂β
− T∂ log
∑h,v exp(−E(h,v))
∂β
=
T∑t=1
∂∑
h exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T ∂∑
h,v exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
∂ exp(−E(h,v(t))
)∂β∑
h exp(−E(h,v(t))
) − T∑h,v∂ exp(−E(h,v))
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h exp
(−E(h,v(t))
)∂−E(h,v(t))
∂β∑h exp
(−E(h,v(t))
) − T∑
h,v exp(−E(h,v))∂−E(h,v)
∂β∑h,v exp(−E(h,v))
=
T∑t=1
∑h
exp(−E(h,v(t))
)∑
h exp(−E(h,v(t))
) ∂ − E(h,v(t))
∂β− T
∑h,v
exp(−E(h,v))∑h,v exp(−E(h,v))
∂ − E(h,v)
∂β
=
T∑t=1
∑h
p(h|v(t))∂ − E(h,v(t))
∂β− T
∑h,v
p(h,v)∂ − E(h,v)
∂β
=
T∑t=1
Ep(h|v(t))
[∂ − E(h,v(t))
∂β
]− T Ep(h,v)
[∂ − E(h,v)
∂β
]Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22
Binary RBM Summary
A two layer RBM can be fully characterized by the following:
E(h,v)p(h,v)Zp(hi = s|v)p(vj = t|h)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 10 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary
v1,1 v1,2 v1,3 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
W1,1,1 WM,N,K W ∈ RM×NK
b1,1 b1,2 b1,3 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
vi,: Valid Assignment?00001
√
01000√
11010 ×00000 ×
Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary
v1,1 v1,2 v1,3 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
W1,1,1 WM,N,K W ∈ RM×NK
b1,1 b1,2 b1,3 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
vi,: Valid Assignment?00001
√
01000√
11010 ×00000 ×
Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary
v1,1 v1,2 v1,3 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
W1,1,1 WM,N,K W ∈ RM×NK
b1,1 b1,2 b1,3 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
vi,: Valid Assignment?00001
√
01000√
11010 ×00000 ×
Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary
v1,1 v1,2 v1,3 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
W1,1,1 WM,N,K W ∈ RM×NK
b1,1 b1,2 b1,3 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
vi,: Valid Assignment?00001
√
01000√
11010 ×00000 ×
Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, First Step: Make Visible Units K-nary, continued
Binary K-nary
E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v)) K-nary requires valid v.
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v)) K-nary requires valid v.
p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.
p(vj = 1|h) sigmoid(bj + hTW :,j
)Undefined for K-nary.
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) Undefined for Binary.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22
Towards RBM for CF, Second Step: Handle Missing Values
How to handle missing values?
Imputation
Fill in the blanks with some estimation based on available information.
Parameter Sharing
Use RBM with different numbers of visible units for differenttraining/test cases.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 13 / 22
Towards RBM for CF, Second Step: Handle Missing Values
How to handle missing values?
Imputation
Fill in the blanks with some estimation based on available information.
Parameter Sharing
Use RBM with different numbers of visible units for differenttraining/test cases.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 13 / 22
Towards RBM for CF, Second Step: Handle Missing Values
How to handle missing values?
Imputation
Fill in the blanks with some estimation based on available information.
Parameter Sharing
Use RBM with different numbers of visible units for differenttraining/test cases.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 13 / 22
Towards RBM for CF, Second Step: Handle Missing Values, continued
Suppose K = 3 for the following case:
m1 m2 m3u1 1 ? ?u2 2 ? 2u3 1 3 ?
v v Visible Unitsu1 100 ??? ??? 100 v1,1, v1,2, v1,3u2 010 ??? 010 010 010 v1,1, v1,2, v1,3, v3,1, v3,2, v3,3u3 100 001 ??? 100 001 v1,1, v1,2, v1,3, v2,1, v2,2, v2,3
Binglin Chen RBM for Collaborative Filtering November 29, 2016 14 / 22
Towards RBM for CF, Second Step: Handle Missing Values, continued
Suppose K = 3 for the following case:
m1 m2 m3u1 1 ? ?u2 2 ? 2u3 1 3 ?
v v Visible Unitsu1 100 ??? ??? 100 v1,1, v1,2, v1,3u2 010 ??? 010 010 010 v1,1, v1,2, v1,3, v3,1, v3,2, v3,3u3 100 001 ??? 100 001 v1,1, v1,2, v1,3, v2,1, v2,2, v2,3
Binglin Chen RBM for Collaborative Filtering November 29, 2016 14 / 22
Towards RBM for CF, Third Step: Make Predictions
p(vq,k = 1|v)
=∑h
p(h, v, vq,k)
∝∑h
exp(−E(h, v, vq,k))
Can be computed in polynomial time.
p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)
However, query of S movies on a user requires KS evaluations.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22
Towards RBM for CF, Third Step: Make Predictions
p(vq,k = 1|v) =∑h
p(h, v, vq,k)
∝∑h
exp(−E(h, v, vq,k))
Can be computed in polynomial time.
p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)
However, query of S movies on a user requires KS evaluations.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22
Towards RBM for CF, Third Step: Make Predictions
p(vq,k = 1|v) =∑h
p(h, v, vq,k)
∝∑h
exp(−E(h, v, vq,k))
Can be computed in polynomial time.
p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)
However, query of S movies on a user requires KS evaluations.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22
Towards RBM for CF, Third Step: Make Predictions
p(vq,k = 1|v) =∑h
p(h, v, vq,k)
∝∑h
exp(−E(h, v, vq,k))
Can be computed in polynomial time.
p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)
However, query of S movies on a user requires KS evaluations.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22
Towards RBM for CF, Third Step: Make Predictions
p(vq,k = 1|v) =∑h
p(h, v, vq,k)
∝∑h
exp(−E(h, v, vq,k))
Can be computed in polynomial time.
p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)
However, query of S movies on a user requires KS evaluations.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22
Towards RBM for CF, Third Step: Make Predictions
p(vq,k = 1|v) =∑h
p(h, v, vq,k)
∝∑h
exp(−E(h, v, vq,k))
Can be computed in polynomial time.
p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)
However, query of S movies on a user requires KS evaluations.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22
Towards RBM for CF, Third Step: Make Predictions, continued
hi = p(hi = 1|v) = sigmoid(ai + W i,:v
)
p(vq,k = 1|h) =exp
(bq,k + h
TW :,q,k
)∑K
l=1 exp(bq,l + h
TW :,q,l
)
A lot faster but slightly less accurate.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 16 / 22
Towards RBM for CF, Third Step: Make Predictions, continued
hi = p(hi = 1|v) = sigmoid(ai + W i,:v
)
p(vq,k = 1|h) =exp
(bq,k + h
TW :,q,k
)∑K
l=1 exp(bq,l + h
TW :,q,l
)
A lot faster but slightly less accurate.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 16 / 22
Towards RBM for CF, Third Step: Make Predictions, continued
hi = p(hi = 1|v) = sigmoid(ai + W i,:v
)
p(vq,k = 1|h) =exp
(bq,k + h
TW :,q,k
)∑K
l=1 exp(bq,l + h
TW :,q,l
)
A lot faster but slightly less accurate.
Binglin Chen RBM for Collaborative Filtering November 29, 2016 16 / 22
Extension 1: Gaussian Hidden Units
K-nary K-nary with Gaussian
E(h,v) −aTh− bTv − hTWvM∑i=1
(hi − ai)2
2σi2− bTv − hTWv
p(h,v)1
Zexp(−E(h,v))
1
Zexp(−E(h,v))
Z∑
h,v exp(−E(h,v))∑
h,v exp(−E(h,v))
p(vj,k = 1|h)exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
) exp(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
)p(hi = 1|v) sigmoid (ai +W i,:v)
p(hi = h|v) 1√2πσi
exp
(−(h− ai − σiW i,:v)
2
2σ2i
)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 17 / 22
Extension 2: Condition On What Users Have Watched
v1,1 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
WM,N,K W ∈ RM×NK
b1,1 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
r ∈ {0, 1}Nr1 r2 rN
U ∈ RM×NU1,1
E(h,v, r) = −aTh− bTv − hTWv − hTWr
p(vj,k = 1|h) =exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
)p(hi = 1|v, r) = sigmoid (ai +W i,:v +U i,:r)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 18 / 22
Extension 2: Condition On What Users Have Watched
v1,1 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
WM,N,K W ∈ RM×NK
b1,1 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
r ∈ {0, 1}Nr1 r2 rN
U ∈ RM×NU1,1
E(h,v, r) = −aTh− bTv − hTWv − hTWr
p(vj,k = 1|h) =exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
)p(hi = 1|v, r) = sigmoid (ai +W i,:v +U i,:r)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 18 / 22
Extension 2: Condition On What Users Have Watched
v1,1 v1,K v2,1 vN,K
h1 h2 h3 hM
visible layer
hidden layer
v ∈ {0, 1}NK
h ∈ {0, 1}M
WM,N,K W ∈ RM×NK
b1,1 b1,K b2,1 bN,K
a1 a2 a3 aM
b ∈ RNK
a ∈ RM
r ∈ {0, 1}Nr1 r2 rN
U ∈ RM×NU1,1
E(h,v, r) = −aTh− bTv − hTWv − hTWr
p(vj,k = 1|h) =exp
(bj,k + hTW :,j,k
)∑Kl=1 exp
(bj,l + hTW :,j,l
)p(hi = 1|v, r) = sigmoid (ai +W i,:v +U i,:r)
Binglin Chen RBM for Collaborative Filtering November 29, 2016 18 / 22
Extension 3: Factorize W
W ∈ RM×NK
100× 17, 770× 5 = 8, 885, 000
Factorize W into PQ
P ∈ RM×C , Q ∈ RC×NK
C �M and C � NK
100× 30 + 30× 17, 770× 5 = 2, 668, 500
Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22
Extension 3: Factorize W
W ∈ RM×NK
100× 17, 770× 5 = 8, 885, 000
Factorize W into PQ
P ∈ RM×C , Q ∈ RC×NK
C �M and C � NK
100× 30 + 30× 17, 770× 5 = 2, 668, 500
Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22
Extension 3: Factorize W
W ∈ RM×NK
100× 17, 770× 5 = 8, 885, 000
Factorize W into PQ
P ∈ RM×C , Q ∈ RC×NK
C �M and C � NK
100× 30 + 30× 17, 770× 5 = 2, 668, 500
Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22
Extension 3: Factorize W
W ∈ RM×NK
100× 17, 770× 5 = 8, 885, 000
Factorize W into PQ
P ∈ RM×C , Q ∈ RC×NK
C �M and C � NK
100× 30 + 30× 17, 770× 5 = 2, 668, 500
Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22