Download - Logistic Regression
Machine learning workshop [email protected]
Machine learning introduc7on Logis&c regression
Feature selec7on Boos7ng, tree boos7ng
See more machine learning post: h>p://dongguo.me
Overview of machine learning
Machine Learning
Unsupervised Learning
Semi-‐supervised Learning
Supervised Learning
Classifica7on Regression
Logis7c regression
How to choose a suitable model?
Characteris&c Naïve Bayes
Trees K Nearest neighbor
Logis&c regression
Neural Networks
SVM
Computa7onal scalability
3 3 1 3 1 1
Interpretability 2 2 1 2 1 1
Predic7ve power 1 1 3 2 3 3
Natural handling data of “mixed” type
1 3 1 1 1 1
Robustness to outliers in input space
3 3 3 3 1 1
<Elements of Sta-s-cal Learning> II P351
Why model can’t perform perfectly on unseen data
• Expected risk
• Empirical risk
• Choose func7on family for predic7on func7ons
• Error
Logis7c regression
Outline
• Introduc7on • Inference • Regulariza7on • Experiments • More – Mul7-‐nominal LR – Generalized linear model
• Applica7on
Logit func7on and logis7c func7on
• Logit func7on
• logis7c func7on: Inversed logit
Logis7c regression
• Predic7on func7on
Inference with maximize likelihood (1)
• Likelihood
• Inference
Inference with maximize likelihood (2)
• Inference cont.
• Use gradient descent
• Stochas7c gradient descent
Regulariza7on
• Penalize large weight to avoid overfi`ng
– L2 regulariza7on
– L1 regulariza7on
Regulariza7on: Maximum a posteriori
• MAP
L2 regulariza7on : Gaussian Prior
• Gaussian prior
• MAP
• Gradient descent step
L1 regulariza7on : Laplace Prior
• Laplace prior
• MAP • Gradient descent step
Implementa7on
• L2 LR • L1 LR
_weightOfFeatures[fea] += step * (feaValue * error - reguParam * _weightOfFeatures[fea]);
if (_weightOfFeatures[fea] > 0) { _weightOfFeatures[fea] += step * (feaValue * error) - step * reguParam; if (_weightOfFeatures[fea] < 0) _weightOfFeatures[fea] = 0; }else if (_weightOfFeatures[fea] < 0) { _weightOfFeatures[fea] += step * (feaValue * error) + step * reguParam; if (_weightOfFeatures[fea] > 0) _weightOfFeatures[fea] = 0; }else{ _weightOfFeatures[fea] += step * (feaValue * error); }
L2 VS. L1
• L2 regulariza7on – Almost all weights are not equal to zero – Not suitable when training samples are scarce
• L1 regulariza7on – Produces sparse parameter vectors – More suitable when most features are irrelevant – Could handle scarce training samples be>er
Experiments
• Dataset – Goal: gender predic7on – Dataset: train samples (431k), test samples (167k)
• Comparison algorithms – A: gradient descent with L1 regulariza7on – B: gradient descent with L2 regulariza7on – C: OWL-‐QN (L-‐BFGS based op7miza7on with L1 regulariza7on)
• Parameters choice – Regulariza7on value – Step(learning speed) – Decay ra7o – Itera7on over condi7on
• Max itera7on 7mes(50) || AUC change <=0.0005
Experiments (cont.)
• Experiments results Parameters and metrics
gradient descent with L1
gradient descent with L2
OWL-‐QN
‘best’ regulariza7on term
0.001~0.005 0.0002~0.001 1
Best step 0.05 0.02~0.05 -‐
Best decay ra7o 0.85 0.85 -‐
Itera7on 7mes 26 20~26 48
Not zero feature / all feature
10492/10938 10938/10938 6629/10938
AUC 0.8470 0.8463 0.8467
Mul7-‐nominal logis7c regression
• Predic7on func7on
• Inference with maximize likelihood
• Gradient descent step (L2)
More Link func7ons
• Inference with maximize likelihood
• Link func7on
• Link func7ons for binomial distribu7on – Logit func7on
– Probit func7on
– Log-‐log func7on
Generalized linear model
• What is GLM – Generaliza7on of linear regression – Connect linear model with response variable by link func7on – More distribu7on for response variable
• Typical GLM – Linear regression , Logis7c regression, Poisson regression
• Overview
Applica7on • Yahoo
– <Personalized Click Predic7on in Sponsored Search> WSDM’10
• Microsoq – <Scalable Training of L1-‐Regularized Log-‐Linear Models> ICML’07
• Baidu – Contextual ads CTR predic7on
• h>p://www.docin.com/p-‐376254439.html
• Hulu – Demographic targe7ng – Other ad-‐targe7ng project – Custom churn predic7on – More…
Reference
• ‘Scalable Training of L1-‐Regularized Log-‐Linear Models’ ICML’07 – h>p://www.docin.com/p-‐376254439.html#
• ‘Genera-ve and discrimina-ve classifiers: Naïve Bayes and logis-c regression’ by Mitchell
• ‘Feature selec-on, L1 vs. L2 regulariza-on, and rota-onal invariance’ ICML’04
Recommended resources
• Machine Learning open class – by Andrew Ng – //10.20.0.130/TempShare/Machine-‐Learning Open Class
• h>p://www.cnblogs.com/vivounicorn/archive/2012/02/24/2365328.html
• logis7c regression Implementa7on[link] – //10.20.0.130/TempShare/guodong/Logis7c regression Implementa7on/ – Support binomial and mul7nominal LR with L1 and L2 regulariza7on
• OWL-‐QN – //10.20.0.130/TempShare/guodong/OWL-‐QN/
Thanks