nlp programming tutorial 9 - advanced …...7 nlp programming tutorial 9 – advanced discriminative...
TRANSCRIPT
![Page 1: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/1.jpg)
1
NLP Programming Tutorial 9 – Advanced Discriminative Learning
NLP Programming Tutorial 9 -Advanced Discriminative Learning
Graham NeubigNara Institute of Science and Technology (NAIST)
![Page 2: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/2.jpg)
2
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Review: Classifiers and the Perceptron
![Page 3: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/3.jpg)
3
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Prediction Problems
Given x, predict y
![Page 4: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/4.jpg)
4
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Example we will use:
● Given an introductory sentence from Wikipedia
● Predict whether the article is about a person
● This is binary classification
Given
Gonso was a Sanron sect priest (754-827)in the late Nara and early Heian periods.
Predict
Yes!
Shichikuzan Chigogataki Fudomyoo isa historical site located at Magura, MaizuruCity, Kyoto Prefecture.
No!
![Page 5: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/5.jpg)
5
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Mathematical Formulation
y = sign (w⋅ϕ (x))
= sign (∑i=1
Iw i⋅ϕi( x))
● x: the input
● φ(x): vector of feature functions {φ1(x), φ
2(x), …, φ
I(x)}
● w: the weight vector {w1, w
2, …, w
I}
● y: the prediction, +1 if “yes”, -1 if “no”● (sign(v) is +1 if v >= 0, -1 otherwise)
![Page 6: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/6.jpg)
6
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Online Learningcreate map wfor I iterations
for each labeled pair x, y in the dataphi = create_features(x)y' = predict_one(w, phi)if y' != y
update_weights(w, phi, y)
● In other words● Try to classify each training example● Every time we make a mistake, update the weights
● Many different online learning algorithms● The most simple is the perceptron
![Page 7: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/7.jpg)
7
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Weight Update
● In other words:● If y=1, increase the weights for features in φ(x)
– Features for positive examples get a higher weight● If y=-1, decrease the weights for features in φ(x)
– Features for negative examples get a lower weight
→ Every time we update, our predictions get better!
w ←w+ y ϕ (x)
update_weights(w, phi, y)for name, value in phi:
w[name] += value * y
![Page 8: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/8.jpg)
8
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Averaged Perceptron
![Page 9: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/9.jpg)
9
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{1, 1}
![Page 10: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/10.jpg)
10
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{1, 1}
{1, -3}
![Page 11: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/11.jpg)
11
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{2, -2}
![Page 12: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/12.jpg)
12
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{2, -2}
{3, -1}
![Page 13: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/13.jpg)
13
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{-1, -1}
![Page 14: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/14.jpg)
14
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{-1, -1}
{-1, -1}
![Page 15: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/15.jpg)
15
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X{0, 0}?
![Page 16: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/16.jpg)
16
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X{0, 0}?
{1, 1}
![Page 17: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/17.jpg)
17
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Perceptron Instability
● Perceptron is instable with non-separable data
● Example:
O
X
O
X O
X
{1, 1}
![Page 18: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/18.jpg)
18
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Result of Perceptron Training
● Long list of weights that neverconverges
● Accuracy greatly influenced by stopping point
{1, 1}{2, -2}{-1, -1}{0, 0}{1, 1}{1, 1}{1, 1}{2, -2}{-1, -1}{0, 0}{1, 1}{1, 1}{1, 1}
Not so bad...
Really bad!
...
![Page 19: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/19.jpg)
19
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Averaged Perceptron Idea● Just take the average of the weights!
{1, 1}{2, -2}{-1, -1}{0, 0}{1, 1}{1, 1}{1, 1}{2, -2}{-1, -1}{0, 0}{1, 1}{1, 1}{1, 1}
average( ) → {0.67, 0}
...
O
X
O
X O
X
![Page 20: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/20.jpg)
20
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Averaged Perceptron in Codecreate map wcreate map avgset updates = 0for I iterations
for each labeled pair x, y in the dataphi = create_features(x)y' = predict_one(w, phi)if y' != y
update_weights(w, phi, y)updates += 1avg = (avg * (updates-1) + w) / updates
★★
★★
● Change the average after every update
![Page 21: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/21.jpg)
21
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Classification Margins
![Page 22: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/22.jpg)
22
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Choosing betweenEqually Accurate Classifiers
● Which classifier is better? Dotted or Dashed?
O
X O
X O
X
![Page 23: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/23.jpg)
23
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Choosing betweenEqually Accurate Classifiers
● Which classifier is better? Dotted or Dashed?
● Answer: Probably the dashed line.
● Why?: It has a larger margin.
O
X O
X O
X
![Page 24: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/24.jpg)
24
NLP Programming Tutorial 9 – Advanced Discriminative Learning
What is a Margin?
● The distance between the classification plane and the nearest example:
O
X O
X O
X
![Page 25: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/25.jpg)
25
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Support Vector Machines
● Most famous margin-based classifier● Hard Margin: Explicitly maximize the margin● Soft Margin: Allow for some mistakes
● Usually use batch learning● Batch learning: slightly higher accuracy, more stable● Online learning: simpler, less memory, faster
convergence● Learn more about SVMs:
http://disi.unitn.it/moschitti/material/Interspeech2010-Tutorial.Moschitti.pdf
● Batch learning libraries:LIBSVM, LIBLINEAR, SVMLite
![Page 26: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/26.jpg)
26
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Online Learning with a Margin
● Penalize not only mistakes, but also correct answers under a margin
create map wfor I iterations
for each labeled pair x, y in the dataphi = create_features(x)val = w * phi * yif val <= margin
update_weights(w, phi, y)
(A correct classifier will always make w * phi * y > 0)If margin = 0, this is the perceptron algorithm
★
![Page 27: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/27.jpg)
27
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Regularization
![Page 28: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/28.jpg)
28
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Cannot Distinguish BetweenLarge and Small Classifiers
● For these examples:
● Which classifier is better?
-1 he saw a bird in the park+1 he saw a robbery in the park
Classifier 1he +3saw -5a +0.5bird -1robbery +1in +5the -3park -2
Classifier 2bird -1robbery +1
![Page 29: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/29.jpg)
29
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Cannot Distinguish BetweenLarge and Small Classifiers
● For these examples:
● Which classifier is better?
-1 he saw a bird in the park+1 he saw a robbery in the park
Classifier 1he +3saw -5a +0.5bird -1robbery +1in +5the -3park -2
Classifier 2bird -1robbery +1
Probably classifier 2!It doesn't use
irrelevant information.
![Page 30: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/30.jpg)
30
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Regularization
● A penalty on adding extra weights
● L2 regularization:● Big penalty on large weights,
small penalty on small weights● High accuracy
● L1 regularization:● Uniform increase whether large
or small● Will cause many weights to
become zero → small model
-2 -1 0 1 20
1
2
3
4
5
L2L1
![Page 31: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/31.jpg)
31
NLP Programming Tutorial 9 – Advanced Discriminative Learning
L1 Regularization in Online Learning
● After update, reduce the weight by a constant c
update_weights(w, phi, y, c)for name, value in w:
if abs(value) < c:w[name] = 0
else:w[name] -= sign(value) * c
for name, value in phi:w[name] += value * y
★★★★★
If abs. value < c,set weight to zero
If value > 0, decrease by cIf value < 0, increase by c
![Page 32: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/32.jpg)
32
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Example● Every turn, we Regularize, Update, Regularize, Update
Regularization: c=0.1Updates: {1, 0} on 1st and 5th turns
{0, -1} on 3rd turn
R1
U1
{0, 0}Change:
w: {0, 0}
{1, 0}
{1, 0}
R2
U2
R3
U3
{-0.1, 0} {0, 0}
{0.9, 0} {0.9, 0} {0.8, 0}
{0, -1}
{0.8, -1}
R4
U4
{-0.1, 0.1}Change:
w: {0.7, -0.9}
{0, 0}
R5
U5
R6
U6
{0, 0}
{0.7, -0.9} {0.6, -0.8} {1.6, -0.8} {1.5, -0.7}{1.5, -0.7}
{-0.1, 0}
{1, 0}{-0.1, 0.1} {-0.1, 0.1}
![Page 33: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/33.jpg)
33
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Efficiency Problems● Typical number of features:
● Each sentence (phi): 10~1000● Overall (w): 1,000,000~100,000,000
This loop isVERY SLOW!
update_weights(w, phi, y, c)for name, value in w:
if abs(value) <= c:w[name] = 0
else:w[name] -= sign(value) * c
for name, value in phi:w[name] += value * y
![Page 34: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/34.jpg)
34
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Efficiency Trick
● Regularize only when the value is used!
● This is called “lazy evaluation”, used in many applications
getw(w, name, c, iter, last)if iter != last[name]: # regularize several times
c_size = c * (iter - last[name])if abs(w[name]) <= c_size:
w[name] = 0else:
w[name] -= sign(w[name]) * c_sizelast[name] = iter
return w[name]
![Page 35: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/35.jpg)
35
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Choosing the Regularization Constant
● The regularization constant c has a large effect
● Large value● small model● lower score on training set● less overfitting
● Small value● large model● higher score on training set● more overfitting
● Choose best regularization value on development set● e.g. 0.0001, 0.001, 0.01, 0.1, 1.0
![Page 36: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/36.jpg)
36
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Exercise
![Page 37: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/37.jpg)
37
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Exercise
● Write program:● train-svm: Creates an svm model with L1 regularization
constant 0.001 and margin 1● Train a model on data-en/titles-en-train.labeled
● Predict the labels of data-en/titles-en-test.word
● Grade your answers and compare them with the perceptron
● script/grade-prediction.py data-en/titles-en-test.labeled your_answer
● Extra challenge:● Try many different regularization constants● Implement the efficiency trick
![Page 38: NLP Programming Tutorial 9 - Advanced …...7 NLP Programming Tutorial 9 – Advanced Discriminative Learning Perceptron Weight Update In other words: If y=1, increase the weights](https://reader033.vdocuments.mx/reader033/viewer/2022043014/5fb2dc36527cf707e9382451/html5/thumbnails/38.jpg)
38
NLP Programming Tutorial 9 – Advanced Discriminative Learning
Thank You!