streamsvm - linear svms and logistic regression …vishy/talks/streamsvm.pdfstreamsvm linear svms...

56
StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V . N. (vishy) Vishwanathan Purdue University and Microsoft [email protected] October 9, 2012 S.V . N. Vishwanathan (Purdue, Microsoft) StreamSVM 1 / 41

Upload: buimien

Post on 29-May-2018

253 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVMLinear SVMs and Logistic Regression When Data Does Not Fit

In Memory

S.V.N. (vishy) Vishwanathan

Purdue University and [email protected]

October 9, 2012

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 1 / 41

Page 2: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Outline

1 Linear Support Vector Machines

2 StreamSVM

3 Experiments - SVM

4 Logistic Regression

5 Experiments - Logistic Regression

6 Conclusion

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 2 / 41

Page 3: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Binary Classification

yi = −1

yi = +1

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 3 / 41

Page 4: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Binary Classification

yi = −1

yi = +1

x | 〈w , x〉 = 0

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 3 / 41

Page 5: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Linear Support Vector Machines

yi = −1

yi = +1

x | 〈w , x〉 = 0x | 〈w , x〉 = −1

x | 〈w , x〉 = 1

x2 x1

〈w , x1 − x2〉 = 2⟨w‖w‖ , x1 − x2

⟩= 2‖w‖

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 4 / 41

Page 6: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Linear Support Vector Machines

Optimization Problem

minw

12‖w‖2 + C

m∑i=1

max (0,1− yi 〈w , xi〉)

−3 −2 −1 1 2 3 4

1

2

3Hinge Loss

Margin

Loss

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 5 / 41

Page 7: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

The Dual

Objective

minα

D(α) :=12

∑i,j

αiαjyiyj⟨xi , xj

⟩−∑

i

αi

s.t. 0 ≤ αi ≤ C

Convex and quadratic objectiveLinear (box) constraintsw =

∑i αiyixi

Each αi corresponds to an xi

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 6 / 41

Page 8: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−2

0

2−3

−2−1

01

23

0

20

40

60

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 9: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−2

0

2−3

−2−1

01

23

0

20

40

60

x y

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 10: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−3 −2 −1 0 1 2 30

20

40

60

x

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 11: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−3 −2 −1 0 1 2 30

20

40

60

x

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 12: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−2

0

2−3

−2−1

01

23

0

20

40

60

x y

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 13: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−3 −2 −1 0 1 2 3

2

4

6

8

y

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 14: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent

−3 −2 −1 0 1 2 3

2

4

6

8

y

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

Page 15: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Linear Support Vector Machines

Coordinate Descent in the Dual

One dimensional function

D(αt ) =α2

t2〈xt , xt〉+

∑i 6=t

αtαiyiyt 〈xi , xt〉 − αt + const.

s.t. 0 ≤ αt ≤ C

Solve

∇D(αt ) = αt 〈xt , xt〉+∑i 6=t

αiyiyt 〈xi , xt〉 − 1 = 0

αt = median(

0,C,1−

∑i 6=t αiyiyt 〈xi , xt〉〈xt , xt〉

)S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 8 / 41

Page 16: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Outline

1 Linear Support Vector Machines

2 StreamSVM

3 Experiments - SVM

4 Logistic Regression

5 Experiments - Logistic Regression

6 Conclusion

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 9 / 41

Page 17: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

We are Collecting Lots of Data . . .

yi = −1

yi = +1

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

Page 18: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

We are Collecting Lots of Data . . .

yi = −1

yi = +1

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

Page 19: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

We are Collecting Lots of Data . . .

yi = −1

yi = +1

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

Page 20: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

We are Collecting Lots of Data . . .

yi = −1

yi = +1

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

Page 21: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

What if Data Does not Fit in Memory?

Idea 1: Block Minimization [Yu et al., KDD 2010]Split data into blocks B1, B2 . . . such that Bj fits in memoryCompress and store each block separatelyLoad one block of data at a time and optimize only those αi ’s

Idea 2: Selective Block Minimization [Chang and Roth, KDD 2011]Split data into blocks B1, B2 . . . such that Bj fits in memoryCompress and store each block separatelyLoad one block of data at a time and optimize only those αi ’sRetain informative samples from each block in main memory

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 11 / 41

Page 22: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

What are Informative Samples?

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 12 / 41

Page 23: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Some Observations

SBM and BM are wastefulBoth split data into blocks and compress the blocks

This requires reading the entire data at least once (expensive)

Both pause optimization while a block is loaded into memory

Hardware 101Disk I/O is slower than CPU (sometimes by a factor of 100)Random access on HDD is terrible

sequential access is reasonably fast (factor of 10)

Multi-core processors are becoming commonplaceHow can we exploit this?

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 13 / 41

Page 24: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Our Architecture

Reader Trainer

RAM RAMHDD

Data Working Set Weight Vec

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 14 / 41

Page 25: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Our Philosophy

Iterate over the data in main memory while streaming datafrom disk. Evict primarily examples from main memory thatare “uninformative”.

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 15 / 41

Page 26: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Reader

for k = 1, . . . ,max_iter dofor i = 1, . . . ,n do

if |A| = Ω thenrandomly select i ′ ∈ AA = A \ i ′delete yi ,Qii , xi from RAM

end ifread yi , xi from Diskcalculate Qii = 〈xi , xi〉store yi ,Qii , xi in RAMA = A ∪ i

end forif stopping criterion is met then

exitend if

end forS.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 16 / 41

Page 27: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Trainer

α1 = 0,w1 = 0, ε = 9, εnew = 0, β = 0.9while stopping criterion is not met do

for t = 1, . . . ,n doIf |A| > 0.9× Ω then ε = βεrandomly select i ∈ A and read yi ,Qii , xi from RAMcompute ∇iD := yi

⟨w t , xi

⟩− 1

if (αti = 0 and ∇iD > ε) or (αt

i = C and ∇iD < −ε) thenA = A \ i and delete yi ,Qii , xi from RAMcontinue

end ifαt+1

i = median(

0,C, αti −

∇i DQii

), w t+1 = w t + (αt+1

i − αti )yixi

εnew = max(εnew, |∇iD|)end forUpdate stopping criterionε = εnew

end whileS.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 17 / 41

Page 28: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Proof of Convergence (Sketch)

Definition (Luo and Tseng Problem)

minimizeα

g(Eα) + b>α

subject to Li ≤ αi ≤ Ui

α,b ∈ Rn, E ∈ Rd×n, Li ∈ [−∞,∞),Ui ∈ (−∞,∞]The above optimization problem is a Luo and Tseng problem if thefollowing conditions hold:

1 E has no zero columns2 the set of optimal solutions A is non-empty3 the function g is strictly convex and twice cont. differentiable4 for all optimal solutions α∗ ∈ A, ∇2g(Eα∗) is positive definite.

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 18 / 41

Page 29: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Proof of Convergence (Sketch)

Lemma (see Theorem 1 of Hsieh et al., ICML 2008)If no data xi = 0, then the SVM dual is a Luo and Tseng problem.

Definition (Almost cyclic rule)There exists an integer B ≥ n such that every coordinate is iteratedupon at least once every B successive iterations.

Theorem (Theorem 2.1 of Luo and Tseng, JOTA 1992)

Let αt be a sequence of iterates generated by a coordinate descentmethod using the almost cyclic rule. The αt converges linearly to anelement of A.

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 19 / 41

Page 30: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

StreamSVM

Proof of Convergence (Sketch)

LemmaIf the trainer accesses the cached training data sequentially, and thefollowing conditions hold:

The trainer is at most κ ≥ 1 times faster than the reading thread,that is, the trainer performs at most κ coordinate updates in thetime that it takes the reader to read one training data from disk.A point is never evicted from the RAM unless the αi correspondingto that point has been updated.

Then this confirms to the almost cyclic rule.

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 20 / 41

Page 31: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Outline

1 Linear Support Vector Machines

2 StreamSVM

3 Experiments - SVM

4 Logistic Regression

5 Experiments - Logistic Regression

6 Conclusion

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 21 / 41

Page 32: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Experiments

dataset n d s(%) n+ :n− Datasizeocr 3.5 M 1156 100 0.96 45.28 GBdna 50 M 800 25 3e−3 63.04 GBwebspam-t 0.35 M 16.61 M 0.022 1.54 20.03 GBkddb 20.01 M 29.89 M 1e-4 6.18 4.75 GB

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 22 / 41

Page 33: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Does Active Eviction Work?

0 0.5 1 1.5 2 2.5·104

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

Linear SVM

webspam-t C = 1.0RandomActive

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 23 / 41

Page 34: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Comparison with Block Minimization

0 1 2 3 4·104

10−9

10−7

10−5

10−3

10−1

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

ocr C = 1.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

Page 35: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Comparison with Block Minimization

0 0.5 1 1.5 2·104

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

webspam-t C = 1.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

Page 36: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Comparison with Block Minimization

0 1 2 3 4·104

10−6

10−5

10−4

10−3

10−2

10−1

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

kddb C = 1.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

Page 37: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Comparison with Block Minimization

0 1 2 3 4·104

10−11

10−9

10−7

10−5

10−3

10−1

Wall Clock Time (sec)

Rel

ativ

eO

bjec

tive

Func

tion

Valu

e

dna C = 1.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

Page 38: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Effect of Varying C

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·105

10−11

10−9

10−7

10−5

10−3

10−1

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

dna C = 10.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41

Page 39: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Effect of Varying C

0 0.5 1 1.5·105

10−2

10−1

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

dna C = 100.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41

Page 40: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Effect of Varying C

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·105

10−1

100

Wall Clock Time (sec)

Rel

ativ

eO

bjec

tive

Func

tion

Valu

e

dna C = 1000.0StreamSVM

SBMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41

Page 41: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Varying Cache Size

0 0.5 1 1.5 2·104

10−6

10−5

10−4

10−3

10−2

10−1

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

cekddb C = 1.0

256 MB1 GB4 GB

16 GB

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 26 / 41

Page 42: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - SVM

Expanding Features

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·105

10−10

10−8

10−6

10−4

10−2

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

cedna expanded C = 1.0

16 GB32GB

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 27 / 41

Page 43: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

Outline

1 Linear Support Vector Machines

2 StreamSVM

3 Experiments - SVM

4 Logistic Regression

5 Experiments - Logistic Regression

6 Conclusion

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 28 / 41

Page 44: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

Logistic Regression

Optimization Problem

minw

12‖w‖2 + C

m∑i=1

log (1 + exp (−yi 〈w , xi〉))

−4 −2 2 4

1

2

3Hinge Loss

Logistic Loss

Margin

Loss

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 29 / 41

Page 45: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

The Dual

Objective

minα

D(α) :=12

∑i,j

αiαjyiyj⟨xi , xj

⟩+∑

i

αi logαi + (C − αi) log(C − αi)

s.t. 0 ≤ αi ≤ C

w =∑

i αiyixi

Convex objective (but not quadratic)Box constraints

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 30 / 41

Page 46: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

Coordinate Descent

One dimensional function

D(αt ) :=α2

t2〈xt , xt〉+

∑i 6=t

αtαiyiyt 〈xi , xt〉

+ αt logαt + (C − αt ) log(C − αt )

s.t. 0 ≤ αt ≤ C

SolveNo closed form solutionA modified Newton method does the job!See Yu, Huang, Lin 2011 for details.

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 31 / 41

Page 47: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

Determining Importance: SVM case

Shrinking

αti = 0 and ∇iD(α) > ε or

αti = C and ∇iD(α) < −ε

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 32 / 41

Page 48: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

Determining Importance: SVM case

−4 −2 2 4 6

0.5

1

Margin

Loss Gradient

∇w loss(w , xi , yi) = αi (KKT condition)

∇πi D(α) = 0|〈w , xi〉 − 1| ≥ ε

Computing ε

ε = maxi∈A∣∣∇πi D(α)

∣∣S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 33 / 41

Page 49: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Logistic Regression

Determining Importance: Logistic Regression

−8 −6 −4 −2 2 4 6 8

0.2

0.4

0.6

0.8

1

Margin

Loss Gradient

∇w loss(w , xi , yi) ≈ αi (KKT condition)

∇iD(α) ≈ 0|〈w , xi〉| ≥ ε

Computing ε

ε = maxi∈A |∇iD(α)|

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 34 / 41

Page 50: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - Logistic Regression

Outline

1 Linear Support Vector Machines

2 StreamSVM

3 Experiments - SVM

4 Logistic Regression

5 Experiments - Logistic Regression

6 Conclusion

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 35 / 41

Page 51: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - Logistic Regression

Does Active Eviction Work?

0 0.5 1 1.5 2 2.5·104

10−10

10−8

10−6

10−4

10−2

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

ce

Logistic Regression

webspam-t C = 1.0RandomActive

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 36 / 41

Page 52: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Experiments - Logistic Regression

Comparison with Block Minimization

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·104

10−10

10−8

10−6

10−4

10−2

100

Wall Clock Time (sec)

Rel

ativ

eFu

nctio

nVa

lue

Diff

eren

cewebspam-t C = 1.0

StreamSVMBM

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 37 / 41

Page 53: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Conclusion

Outline

1 Linear Support Vector Machines

2 StreamSVM

3 Experiments - SVM

4 Logistic Regression

5 Experiments - Logistic Regression

6 Conclusion

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 38 / 41

Page 54: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Conclusion

Things I did not talk about/Work in Progress

StreamSVMAmortizing the cost of training different modelsOther storage Heirarchies (e.g. Solid State Drives)Extensions to other problems

Multiclass problemsStructured Output Prediction

Distributed version of the algorithm

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 39 / 41

Page 55: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Conclusion

References

Shin Matsushima, S. V. N. Vishwanathan, and Alexander J Smola.Linear Support Vector Machines via Dual Cached Loops. KDD2012.Code for StreamSVM available fromhttp://www.r.dl.itc.u-tokyo.ac.jp/~masin/streamsvm.html

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 40 / 41

Page 56: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan

Conclusion

Joint work with

Reading Thread Training Thread

Update

RAM

Weight Vector

RAM

Cached Data (Working Set)

Disk

Dataset

Read

(Random Access)

Read

(Sequential Access)

Load

(Random Access)

S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 41 / 41