logistic regression and pos tagging
TRANSCRIPT
![Page 1: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/1.jpg)
Logistic Regressionand
POS Tagging
CSE392 - Spring 2019Special Topic in CS
![Page 2: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/2.jpg)
Task
● Parts-of-Speech Tagging● Machine learning:
○ Logistic regressionhow?
![Page 3: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/3.jpg)
Parts-of-Speech
Open Class:
Nouns, Verbs, Adjectives, Adverbs
Function words:
Determiners, conjunctions, pronouns, prepositions
![Page 4: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/4.jpg)
Parts-of-Speech: The Penn Treebank Tagset
![Page 5: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/5.jpg)
Parts-of-Speech:Social Media Tagset(Gimpel et al., 2010)
![Page 6: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/6.jpg)
POS Tagging: Applications
● Resolving ambiguity (speech: “lead”)
● Shallow searching: find noun phrases
● Speed up parsing
● Use as feature (or in place of word)
For this course:
● An introduction to language-based classification (logistic regression)
● Understand what modern deep learning methods are dealing with implicitly.
![Page 7: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/7.jpg)
![Page 8: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/8.jpg)
Logistic Regression
Binary classification goal: Build a “model” that can estimate P(A=1|B=?)
i.e. given B, yield (or “predict”) the probability that A=1
![Page 9: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/9.jpg)
Logistic Regression
Binary classification goal: Build a “model” that can estimate P(A=1|B=?)
i.e. given B, yield (or “predict”) the probability that A=1
In machine learning, tradition to use Y for the variable being predicted and X for the features use to make the prediction.
![Page 10: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/10.jpg)
Logistic Regression
Binary classification goal: Build a “model” that can estimate P(Y=1|X=?)
i.e. given X, yield (or “predict”) the probability that Y=1
In machine learning, tradition is to use Y for the variable being predicted and X for the features use to make the prediction.
![Page 11: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/11.jpg)
Logistic Regression
Binary classification goal: Build a “model” that can estimate P(Y=1|X=?)
i.e. given X, yield (or “predict”) the probability that Y=1
In machine learning, tradition is to use Y for the variable being predicted and X for the features use to make the prediction.
Example: Y: 1 if target is verb, 0 otherwise; X: 1 if “was” occurs before target; 0 otherwise
I was reading for NLP. We were fine. I am good.
The cat was very happy. We enjoyed the reading material. I was good.
![Page 12: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/12.jpg)
Logistic Regression
Binary classification goal: Build a “model” that can estimate P(Y=1|X=?)
i.e. given X, yield (or “predict”) the probability that Y=1
In machine learning, tradition is to use Y for the variable being predicted and X for the features use to make the prediction.
Example: Y: 1 if target is verb, 0 otherwise; X: 1 if “was” occurs before target; 0 otherwise
I was reading for NLP. We were fine. I am good.
The cat was very happy. We enjoyed the reading material. I was good.
![Page 13: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/13.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
![Page 14: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/14.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
![Page 15: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/15.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
x y
2 1
1 0
0 0
6 1
2 1
![Page 16: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/16.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
x y
2 1
1 0
0 0
6 1
2 1
![Page 17: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/17.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
x y
2 1
1 0
0 0
6 1
2 1
![Page 18: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/18.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
x y
2 1
1 0
0 0
6 1
2 1
1 1
![Page 19: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/19.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
x y
2 1
1 0
0 0
6 1
2 1
1 1
![Page 20: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/20.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
x y
2 1
1 0
0 0
6 1
2 1
1 1
![Page 21: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/21.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
x y
2 1
1 0
0 0
6 1
2 1
1 1
![Page 22: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/22.jpg)
Logistic Regression
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
x y
2 1
1 0
0 0
6 1
2 1
1 1
optimal b_0, b_1 changed!
![Page 23: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/23.jpg)
Logistic Regression on a single feature (x)
Yi ∊ {0, 1}; X is a single value and can be anything numeric.
![Page 24: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/24.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Logistic Regression on a single feature (x)
![Page 25: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/25.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
Logistic Regression on a single feature (x)
![Page 26: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/26.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
X is given. B0 and B1 must be learned.
Logistic Regression on a single feature (x)
![Page 27: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/27.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
X is given. B0 and B1 must be learned.
Logistic Regression on a single feature (x)
HOW? Essentially, try different B0 and B1 values until “best fit” to the training data (example X and Y).
![Page 28: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/28.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
X is given. B0 and B1 must be learned.
Logistic Regression on a single feature (x)
HOW? Essentially, try different B0 and B1 values until “best fit” to the training data (example X and Y).
“best fit” : whatever maximizes the likelihood function:
![Page 29: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/29.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
X is given. B0 and B1 must be learned.
Logistic Regression on a single feature (x)
HOW? Essentially, try different B0 and B1 values until “best fit” to the training data (example X and Y).
“best fit” : whatever maximizes the likelihood function:
To estimate , one can use reweighted least squares:
(Wasserman, 2005; Li, 2010)
![Page 30: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/30.jpg)
X can be multiple features
Often we want to make a classification based on multiple features:
● Number of capital letters surrounding: integer
● Begins with capital letter: {0, 1}● Preceded by “the”? {0, 1}
We’re learning a linear (i.e. flat) separating hyperplane, but fitting it to a logit outcome.
![Page 31: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/31.jpg)
X can be multiple features
Often we want to make a classification based on multiple features:
● Number of capital letters surrounding: integer
● Begins with capital letter: {0, 1}● Preceded by “the”? {0, 1}
We’re learning a linear (i.e. flat) separating hyperplane, but fitting it to a logit outcome.
(https://www.linkedin.com/pulse/predicting-outcomes-probabilities-logistic-regression-konstantinidis/)
![Page 32: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/32.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
X is given. B0 and B1 must be learned.
Logistic Regression on a single feature (x)
HOW? Essentially, try different B0 and B1 values until “best fit” to the training data (example X and Y).
“best fit” : whatever maximizes the likelihood function:
To estimate , one can use reweighted least squares:
(Wasserman, 2005; Li, 2010)
![Page 33: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/33.jpg)
Yi ∊ {0, 1}; X can be anything numeric.
The goal of this function is to: take in the variable x and return a probability that Y is 1.
Note that there are only three variables on the right: Xi , B0 , B1
X is given. B0 and B1 must be learned.
Logistic Regression on a single feature (x)
HOW? Essentially, try different B0 and B1 values until “best fit” to the training data (example X and Y).
“best fit” : whatever maximizes the likelihood function:
To estimate , one can use reweighted least squares:
(Wasserman, 2005; Li, 2010)
This is just one way of finding the betas that maximize the likelihood function. In practice, we will use existing libraries that are fast and support additional useful steps like regularization..
![Page 34: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/34.jpg)
Logistic Regression
Yi ∊ {0, 1}; X can be anything numeric.
We’re learning a linear (i.e. flat) separating hyperplane, but fitting it to a logit outcome.
(https://www.linkedin.com/pulse/predicting-outcomes-probabilities-logistic-regression-konstantinidis/)
![Page 35: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/35.jpg)
Logistic Regression
Yi ∊ {0, 1}; X can be anything numeric.
We’re still learning a linear separating hyperplane, but fitting it to a logit outcome.
(https://www.linkedin.com/pulse/predicting-outcomes-probabilities-logistic-regression-konstantinidis/)
=0
![Page 36: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/36.jpg)
Logistic Regression
x y
2 1
1 0
0 0
6 1
2 1
1 1
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
![Page 37: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/37.jpg)
Example: Y: 1 if target is a part of a proper noun, 0 otherwise; X1: number of capital letters in target and surrounding words.
They attend Stony Brook University. Next to the brook Gandalf lay thinking.
The trail was very stony. Her degree is from SUNY Stony Brook.
The Taylor Series was first described by Brook Taylor, the mathematician.
They attend Binghamton.
Logistic Regression
x2 x1 y
1 2 1
0 1 0
0 0 0
1 6 1
1 2 1
1 1 1
X2: does the target word start with a capital letter?Let’s add a feature!
![Page 38: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/38.jpg)
Machine Learning Goal: Generalize to new data
Training Data Testing Data
Model
Does the model hold up?
![Page 39: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/39.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
![Page 40: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/40.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
![Page 41: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/41.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1 + 179*x
2 + 71*x
3 + 18*x
4 + -59*x
5 + 19*x
6 = logit(Y)
x1 x
2 ...
![Page 42: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/42.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1 + 179*x
2 + 71*x
3 + 18*x
4 + -59*x
5 + 19*x
6 = logit(Y)
x1 x
2 ...
![Page 43: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/43.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1 + 179*x
2 + 71*x
3 + 18*x
4 + -59*x
5 + 19*x
6 = logit(Y)
x1 x
2 ...
“overfitting”
![Page 44: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/44.jpg)
Overfitting (1-d non-linear example)
![Page 45: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/45.jpg)
Overfitting (1-d non-linear example)
Underfit
(image credit: Scikit-learn; in practice data are rarely this clear)
![Page 46: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/46.jpg)
Overfitting (1-d non-linear example)
Underfit Overfit
(image credit: Scikit-learn; in practice data are rarely this clear)
![Page 47: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/47.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1 + 179*x
2 + 71*x
3 + 18*x
4 + -59*x
5 + 19*x
6 = logit(Y)
x1 x
2 ...
“overfitting”
![Page 48: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/48.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0
0 0.5
0 0
0 0
0.25 1
What if only 2 predictors?
x1 x
2
![Page 49: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/49.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
1
1
0
0
1
X = Y0.5 0
0 0.5
0 0
0 0
0.25 1
0 + 2*x1 + 2*x
2
= logit(Y)
What if only 2 predictors? better fit
x1 x
2
![Page 50: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/50.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
L1 Regularization - “The Lasso”Zeros out features by adding values that keep from perfectly fitting the data.
![Page 51: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/51.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
L1 Regularization - “The Lasso”Zeros out features by adding values that keep from perfectly fitting the data.
![Page 52: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/52.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
L1 Regularization - “The Lasso”Zeros out features by adding values that keep from perfectly fitting the data.
set betas that maximize L
![Page 53: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/53.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
L1 Regularization - “The Lasso”Zeros out features by adding values that keep from perfectly fitting the data.
set betas that maximize penalized L
![Page 54: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/54.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
L1 Regularization - “The Lasso”Zeros out features by adding values that keep from perfectly fitting the data.
set betas that maximize penalized L
Sometimes written as:
![Page 55: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/55.jpg)
Last concept for logistic regression!Logistic Regression - Regularization
L2 Regularization - “Ridge”Shrinks features by adding values that keep from perfectly fitting the data.
set betas that maximize penalized L
Sometimes written as:
![Page 56: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/56.jpg)
Machine Learning Goal: Generalize to new data
Training Data Testing Data
Model
Does the model hold up?
![Page 57: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/57.jpg)
Machine Learning Goal: Generalize to new data
Training Data Testing Data
Model
Develop-mentData
Model
Set regularization parameters
Does the model hold up?
![Page 58: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/58.jpg)
Logistic Regression - Review
● Classification: P(Y | X)● Learn logistic curve based on example data
○ training + development + testing data● Set betas based on maximizing the likelihood
○ “shifts” and “twists” the logistic curve● Multivariate features● Separation represented by hyperplane● Overfitting● Regularization
![Page 59: Logistic Regression and POS Tagging](https://reader030.vdocuments.mx/reader030/viewer/2022020700/61f60a955cd7cc3b5e178c62/html5/thumbnails/59.jpg)
Example
See notebook on website.