introduction of online machine learning algorithms

37
Paper Report for SDM course in 2016 Ad Click Prediction: a View from the Trenches (Online Machine Learning) 報 報 報報報報報報報報報報 報報2016/12/22

Upload: shao-yen-hung

Post on 12-Apr-2017

71 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Introduction of Online Machine Learning Algorithms

Paper Report for SDM course in 2016

Ad Click Prediction: a View from the Trenches(Online Machine Learning)

報 告 者:蔡宗倫、洪紹嚴、蔡佳盈日期: 2016/12/22

Page 3: Introduction of Online Machine Learning Algorithms

3Final Presentation for SDM-2016

Page 4: Introduction of Online Machine Learning Algorithms

4Final Presentation for SDM-2016

READ DATA Time Memory

read.csv 264.5 (secs) 8.73 (GB)

fread 33.18 (secs) 2.98 (GB)

read.big.matrix 205.03 (secs) 0.2 (MB)

2GB 資料,四百萬筆資料, 200個變數

lm Time Memory

read.csv X X

fread X X

read.big.matrix 2.72 (mins) 83.6 (MB)

Page 5: Introduction of Online Machine Learning Algorithms

5Final Presentation for SDM-2016

Page 6: Introduction of Online Machine Learning Algorithms

6Final Presentation for SDM-2016

Page 7: Introduction of Online Machine Learning Algorithms

7Final Presentation for SDM-2016

Page 8: Introduction of Online Machine Learning Algorithms

8Final Presentation for SDM-2016

Big Data (TB, PB, ZB)

Model

Train • Memory• Time/Accuracy

Problem

• Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB)

• R-package: read.table, bigmemory, ff (GB)

• Online learning algorithms

Solutions

Page 9: Introduction of Online Machine Learning Algorithms

9Final Presentation for SDM-2016

TG(2009, Microsoft)

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

AOGD(2007, IBM)

Adaptive online gradient descend Truncated Gradient

Online learning algorithms

Regularized dual averaging

Follow-the-regularized-Leader Proximal

Forward-Backward Splitting

Page 10: Introduction of Online Machine Learning Algorithms

10Final Presentation for SDM-2016

Big Data (TB, PB, ZB)

Model

Train

Newdata

Renewweights

• Memory• Time/accuracy

Sparsity (LASSO)

SGD/OGD (NN/GBM)

Problem

Page 11: Introduction of Online Machine Learning Algorithms

11Final Presentation for SDM-2016

TG(2009, Microsoft)

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

AOGD(2007, IBM)

+¿ ¿

Online learning algorithms

Adaptive online gradient descend Truncated Gradient

Regularized dual averaging

Follow-the-regularized-Leader Proximal

Forward-Backward Splitting

Page 12: Introduction of Online Machine Learning Algorithms

12

Online Gradient Descent-OGDKind of algorithms used on the online convex optimization

Can be formulated as a repeated game between a player and an adversary

At round , the player chooses an action from some convex subset , and then the

adversary chooses a convex loss function

A central question is how the regret grows with the number of rounds

of the game

Final Presentation for SDM-2016

Page 13: Introduction of Online Machine Learning Algorithms

13

Online Gradient Descent-OGDZinkevich considered the following gradient descent algorithm, with step

size

Here,

Final Presentation for SDM-2016

Page 14: Introduction of Online Machine Learning Algorithms

14

Forward-Backward Splitting (FOBOS)

(1)Loss function of Logistic Regression: =

=Batch gradient descend formula:Online gradient descend formula:

=η𝜕

𝑙 (𝑊 𝑡 ,𝑋 )𝜕𝑊 𝑡

Final Presentation for SDM-2016

Page 15: Introduction of Online Machine Learning Algorithms

15

Forward-Backward Splitting (FOBOS)

Final Presentation for SDM-2016

(1)Loss function of Logistic Regression: =

=Batch gradient descend formula:Online gradient descend formula:

(2) FOBOS 的梯度下降公式,可以細分為兩部分: 前部分:微調發生在梯度下降的結果 () 附近 後部分:處理正則化,產生稀疏性

r(w) = (regularization functions)

=

Page 16: Introduction of Online Machine Learning Algorithms

16Final Presentation for SDM-2016

(3) 要求得 (2) 最佳解的充分條件 : 0 屬於其 subgradient set 之中

(4) 因為 , (3) 可以改寫成:

(5) 換句話說,把 (4) 移項之後:

① 迭代前的狀態與梯度 backward

② 當次迭代的正則項資訊 forward

x

y

Forward-Backward Splitting (FOBOS)

Page 17: Introduction of Online Machine Learning Algorithms

17

FOBOS, RDA, FTRL-Proximal

Final Presentation for SDM-2016

(A) :過去的累積梯度量(B) : regularization functions(C) : proximal: = learning rate ( 保證微調不會離 0 或已迭代後的解太遠 )

(non-smooth convex function) : certain subgradient of

Page 18: Introduction of Online Machine Learning Algorithms

18

FOBOS, RDA, FTRL-Proximal

Final Presentation for SDM-2016

OGD 不夠稀疏 FOBOS 能產生更加好的稀疏特徵梯度下降類方法,精度比較好

RDA 可以在精度與稀疏性之間做更好的平衡稀疏性更加出色

最關鍵的不同點是累積 L1 懲罰項的處理方式FTRL-Proximal

綜合 FOBOS 的精度和 RDA 的稀疏性

Page 19: Introduction of Online Machine Learning Algorithms

19Final Presentation for SDM-2016

Page 20: Introduction of Online Machine Learning Algorithms

20Final Presentation for SDM-2016

f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F1 2 3 4

Per-Coordinate

Page 21: Introduction of Online Machine Learning Algorithms

21Final Presentation for SDM-2016

f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F1 2 3 4

8 5 7 3

Per-Coordinate

Page 22: Introduction of Online Machine Learning Algorithms

22Final Presentation for SDM-2016

f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F1 2 3 4

8 5 7 3

Per-Coordinate

Page 23: Introduction of Online Machine Learning Algorithms

23Final Presentation for SDM-2016

Big Data (TB, PB, ZB)

Model

Train

Newdata

Renew Weights(per-coordinate)

• Memory• Time/Accuracy

Sparsity (LASSO)

SGD/OGD (NN/GBM)

Problem

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

+¿

Page 24: Introduction of Online Machine Learning Algorithms

24Final Presentation for SDM-2016

Page 25: Introduction of Online Machine Learning Algorithms

25Final Presentation for SDM-2016

R package: FTRLProximal

Page 26: Introduction of Online Machine Learning Algorithms

26Final Presentation for SDM-2016

Page 27: Introduction of Online Machine Learning Algorithms

27Final Presentation for SDM-2016

Page 28: Introduction of Online Machine Learning Algorithms

28Final Presentation for SDM-2016

Page 29: Introduction of Online Machine Learning Algorithms

29Final Presentation for SDM-2016

Page 30: Introduction of Online Machine Learning Algorithms

30Final Presentation for SDM-2016

Page 31: Introduction of Online Machine Learning Algorithms

31Final Presentation for SDM-2016

Page 32: Introduction of Online Machine Learning Algorithms

32Final Presentation for SDM-2016

Page 33: Introduction of Online Machine Learning Algorithms

33Final Presentation for SDM-2016

Page 35: Introduction of Online Machine Learning Algorithms

35Final Presentation for SDM-2016

5.87GB

Prediction result

Page 36: Introduction of Online Machine Learning Algorithms

36Final Presentation for SDM-2016

Page 37: Introduction of Online Machine Learning Algorithms

37Final Presentation for SDM-2016

[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009.

[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009.

[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010.

[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.

[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013.

[6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. Technical Report UCB/EECS-2007-82, EECS Department, University of California, Berkeley, Jun 2007.

[7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.

Reference