lecture 10: svm and mira
DESCRIPTION
Outline: margin, maximizing margin, the norm, support vectors machines, SVM, Margin Infused Relaxed Algorithm, MIRATRANSCRIPT
Machine Learning for Language Technology Lecture 10: SVM and MIRA
Marina San5ni Department of Linguis5cs and Philology Uppsala University, Uppsala, Sweden
Autumn 2014
Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials
1
Margin
Maximizing Margin (i)
Maximizing Margin (ii)
Maximizing Margin (iii)
Max Margin = Min Norm
Maximizing the margin
Linear Classifiers: Repe55on & Extension 7
• The no5on of margin: a way of predic5ng what it will be a good separa5on on the test set.
• Intui5vely, if we make the margin between opposite groups as wide as possible, our chances to guess correct in the test set should increase.
• the generaliza5on error on unseen test data is propor5onal to the inverse of the margin: the larger the margin, the smaller the generaliza5on error
Support Vector Machines (SVM) (i)
Support Vector Machines (SVM) (ii)
Margin Infused Relaxed Algorithm (MIRA)
MIRA
Perceptron vs. SVMs/MIRA
Linear Classifiers: Repe55on & Extension 12
Perceptron SVMs/MIRA If the training set is separable by some margin, the Perceptron will find a weight vector that separates the data, but it will not necessarily pick up the vector that maximizes the margin. If we are lucky, it will be a vector with the largest margin, but there will be no guarantee.
SVMs/MIRA want a weight vector that maximizes the margin to 1. Here the margin is normalized to 1. So we put a constraint on the weight vector saying that the weight should be such that when you computes the norm we should get 1. We keep the margin fixed and minimize the norm. That is, we want the smallest weight vector that gives us margin 1.
We do not minimize the norm, we minimize the norm squared divided by 2 to make the math easier (trust the people who suggested this J )
Summary
The end