middle term exam 03/01 (thursday), take home, turn in at noon time of 03/02 (friday)
TRANSCRIPT
Logistic Regression
• Generative models often lead to linear decision boundary
• Linear discriminatory model• Directly model the linear decision boundary
• w is the parameter to be decided
Logistic Regression
• Convex objective function, global optimal• Gradient descent Classification error
Logistic Regression
• Convex objective function, global optimal• Gradient descent Classification error
Example: Heart Disease
• Input feature x: age group id
• Output y: if having heart disease
• y=1: having heart disease
• y=-1: no heart disease
1: 25-29
2: 30-34
3: 35-39
4: 40-44
5: 45-49
6: 50-54
7: 55-59
8: 60-64
0
2
4
6
8
10
1 2 3 4 5 6 7 8
Age group
Nu
mb
er o
f P
eop
le
No heart Disease
Heart disease
Example: Heart Disease
0
2
4
6
8
10
1 2 3 4 5 6 7 8
Age group
Num
ber o
f Peo
ple
No heart Disease
Heart disease
Example: Text Categorization
Learn to classify text into two categories• Input d: a document, represented by a word
histogram• Output y=1: +1 for political document, -1 for non-
political document
Example 2: Text Classification
• Dataset: Reuter-21578
• Classification accuracy
• Naïve Bayes: 77%
• Logistic regression: 88%
Logistic Regression vs. Naïve Bayes
• Both are linear decision boundaries
• Naïve Bayes:
• Logistic regression: learn weights by MLE• Both can be viewed as modeling p(d|y)
• Naïve Bayes: independence assumption• Logistic regression: assume an exponential family
distribution for p(d|y) (a broad assumption)
Discriminative vs. Generative
Discriminative ModelsModel P(y|x) Pros• Usually good performance Cons• Slow convergence• Expensive computation• Sensitive to noise data
Generative ModelsModel P(x|y)Pros• Usually fast converge• Cheap computation• Robust to noise dataCons• Usually performs worse
Overfitting Problem
Consider text categorization
• What is the weight for a word j appears in only one training document dk?
Using regularization Without regularization
Iteration
Overfitting Problem
Decrease in the classification accuracy of test data
Solution: Regularization
Regularized log-likelihood
The effects of regularizer• Favor small weights• Guarantee bounded norm of w• Guarantee the unique solution
Regularized Logistic Regression
Using regularization Without regularization
Iteration
Classification performance by regularization
Regularization as Robust Optimization
• Assume each data point is unknown but bounded in a sphere of radius r and center xi
Sparse Solution by Lasso Regularization
How to solve the optimization problem?• Subgradient descent• Minimax
Multi-class Logistic Regression
• How to extend logistic regression model to multi-class classification ?
Conditional Exponential Model
• Let classes be
• Need to learn
Normalization factor (partition function)