jacob gardner · 2015. 3. 20. · quan zhou, wenlin chen, shiji song, jacob r. gardner, kilian q....
TRANSCRIPT
Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q. Weinberger, Yixin Chen
Support Vector Elastic Network
“Sven the Terrible”
Traditional Computer Science
Data
ProgramOutput
Computer
Traditional CS:
Machine Learning
Data
ProgramOutput
Computer
Traditional CS:
Machine Learning:
Data
OutputProgram
Computer
Support Vector Machines
w >x
min
w
1
2
kwk22 + CnX
i=1
max(0, 1� yi(w>xi))
2}
L2 Regularization.
}
Squared hinge loss.
14644 Citations
Published in ML journals
Usable means MATLAB
Fast means parallel
Many GPU Implementations
Support Vector Machines
w >x
min
w
1
2
kwk22 + CnX
i=1
max(0, 1� yi(w>xi))
2}
L2 Regularization.
}
Squared hinge loss.
14644 Citations
Published in ML journals
Usable means MATLAB
Fast means parallel
Many GPU Implementations
Elastic Net/Lasso
min�
kX� � yk22 + �2k�k22such that |�|1 t
13856 Citations
Published in stats journals
Usable means R
Fast means Fortran
Zero GPU Implementations
min�
kX� � yk22 + �2k�k22such that |�|1 t
13856 Citations
Published in stats journals
Usable means R
Fast means Fortran
Zero GPU Implementations
Elastic Net/Lasso
min�
kX� � yk22 + �2k�k22such that |�|1 t
13856 Citations
Published in stats journals
Usable means R
Fast means Fortran
Zero GPU Implementations
Elastic Net/Lasso
min�
kX� � yk22 + �2k�k22such that |�|1 t
t
0 0.5 1 1.50.2
0
0.2
0.4
0.6 Glmnet
0 0.5 1 1.50.2
0
0.2
0.4
0.6 SVEN (GPU)
Coe
ffici
ents
�i
L1 budget t L1 budget t
Equivalence of regularization path
L1 Budget
Elastic Net/Lasso
+ interpretable+ parallel + scales to large data + multi-platform
- slow - does not scale
- not interpretable
Elastic Net SVM
Reductions
Problem A Problem B
Solution BSolution A
Elastic Net SVM
Input X,Y Input Xnew,Ynew
Output � ↵Output
Reductions
Problem A Problem B
Solution BSolution A
[n,p] = size(X); Xnew = [bsxfun(@minus,X,Y./t) bsxfun(@plus,X,Y./t)]'; Ynew = [ones(p,1); -ones(p,1)]; C = 1/(2*lambda);
alpha = C * max(1 - Ynew.*(Xnew*model.w),0); beta = t*(alpha(1:p) - alpha(p+1:2*p)) / sum(alpha);
model = trainsvmGPU(Ynew,sparse(Xnew),['-q -s 1 -c ' num2str(C)]);
Input X,Y Input Xnew,Ynew
Output � ↵Output
Elastic Net SVMfunction beta = SVEN(X,Y,t,lambda)
Results
0 0.5 1 1.50.2
0
0.2
0.4
0.6 Glmnet
0 0.5 1 1.50.2
0
0.2
0.4
0.6 SVEN (GPU)
Coe
ffici
ents
�i
L1 budget t L1 budget t
Equivalence of regularization path
ResultsO
ther
alg
. run
time
(sec
)
101
MITFaces [n=489410, p=361] Yahoo [n=141397, p=519] YMSD [n=463715, p=90]
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
FD [n=400000, p=900]
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) runtime (sec)100
100
101
102
102 101100 102
100
101
102
100 10110-110-1
100
101
101
101
102
102
glmnet SVEN (CPU)Shotgun L1_Ls
n>>d datasets
O(d2)Running time:
Or…
ResultsO
ther
alg
. run
time
(sec
)
GLI85 [n=85, p=22283] arcene [n=900, p=10000] SMKCAN187 [n=187, p=19993] GLABRA180 [n=180, p=49151]
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
100
10-1
10-2
101
10010-110-2 101 10010-1 101 102
10-1
100
101
102
10010-1 101
10-1
100
101
10-1
100
101
102
10010-1 101 102
glmnet SVEN (CPU)Shotgun L1_Ls
PEMS [n=440, p=138672] scene15 [n=544, p=71963] dorothea [n=800, p=88119] E2006 [n=3308, p=72812]
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) fa
ster
SVEN (GPU) s
lower
SVEN (GPU) runtime (sec)10010-1 101 102
10-1
100
101
102
10-1
100
101
102
10010-1 101 102 10010-1 101 10210-1
100
101
102
100
101
102
103
100 101 102 103
d>>n datasets
Running time: O(n2)
Conclusion
Elastic Net and SVM are equivalent problems.
Many optimizations only for SVM now apply to Elastic Net.
This leads to the fastest Elastic Net solver we are aware of.
Questions?
“Sven the Nice?”