introduction to machine learning fall 2013 perceptron (6) prof. koby crammer department of...

Introduction to Machine LearningFall 2013

Perceptron (6)

Prof. Koby Crammer

Department of Electrical Engineering

Technion1

Online Learning

Tyrannosaurus rex

Online Learning

Triceratops

Online Learning

Tyrannosaurus rex

Velocireptor

Formal Setting – Binary Classification

• Instances – Images, Sentences

• Labels– Parse tree, Names

• Prediction rule– Linear predictions rules

• Loss– No. of mistakes

Online Framework

• Initialize Classifier• Algorithm works in rounds• On round the online algorithm :

– Receives an input instance – Outputs a prediction– Receives a feedback label– Computes loss– Updates the prediction rule

• Goal : – Suffer small cumulative loss

Why Online Learning?

• Fast• Memory efficient - process one example at a

time• Simple to implement• Formal guarantees – Mistake bounds • Online to Batch conversions• No statistical assumptions• Adaptive

• Not as good as a well designed batch algorithms

Update Rules

• Online algorithms are based on an update rule which defines from (and possibly other information)

• Linear Classifiers : find from based on the input

• Some Update Rules :

– Perceptron (Rosenblat)

– ALMA (Gentile)

– ROMMA (Li & Long)

– NORMA (Kivinen et. al)

– MIRA (Crammer & Singer)

– EG (Littlestown and Warmuth)

– Bregman Based (Warmuth)

– Numerous Online Convex

Programming Algorithms

• The Perceptron Algorithm :– Agmon 1954; – Rosenblatt 1952-1962, – Block 1962, Novikoff 1962, – Minsky & Papert 1969, – Freund & Schapire 1999, – Blum & Dunagan 2002

The Perceptron Algorithm

• If No-Mistake

– Do nothing

• If Mistake

– Update

• Margin after update :

Prediction:

Geometrical Interpretation

• For any competitor prediction function• We bound the loss suffered by the algorithm with

the loss suffered by

Relative Loss Bound

Cumulative Loss Suffered by the

Algorithm

Sequence of Prediction Functions

Cumulative Loss of Competitor

Relative Loss Bound

InequalityPossibly Large Gap

RegretExtra Loss

CompetitivenessRatio

Relative Loss Bound

Grows With T ConstantGrows With T

Relative Loss Bound

Best Prediction Function in hindsight for the data sequence

Remarks

• If the input is inseparable, then the problem of finding a separating hyperplane which attains less then M errors is NP-hard (Open hemisphere)

• Obtaining a zero-one loss bound with a unit competitiveness ratio is as hard as finding a constant approximating error for the Open Hemesphere problem.

• Bound of the number of mistakes the perceptron makes with the hinge loss of any compitetor

Definitions

• Any Competitor• The parameters vector can be chosen using

the input data

• The parameterized hinge loss of on

• True hinge loss• 1-norm of hinge loss

Geometrical Assumption

• All examples are bounded in a ball of radius R

Perceptron’s Mistake Bound

• Bounds :

• If the sample is separable then

Proof - Intuition

• Two views :– The angle between and decreases with

– The following sum is fixed

as we make more mistakes, our solution is better

[FS99, SS05]

• Define the potential :

• Bound it’s cumulative sum from above and below

• Bound from above :

Telescopic Sum Non-NegativeZero Vector

• Bound From Below :– No error on tth round

– Error on tth round

• We bound each term :

• Bound From Below :– No error on tth round

– Error on tth round

• Cumulative bound :

• Putting both bounds together :

• We use first degree of freedom (and scale) :

• Bound :

• General Bound :

• Choose :

• Simple Bound :

Objective of SVM

• Better bound : optimize the value of

Remarks

• Bound does not depend on dimension of the feature vector

• The bound holds for all sequences. It is not tight for most real world data

• But, there exists a setting for which it is tight – worst

Three Bounds

Separable Case

• Assume there exists such that for all examples Then all bounds are equivalent

• Perceptron makes finite number of mistakes until convergence (not necessarily to )

Separable Case – Other Quantities

• Use 1st (parameterization) degree of freedom

• Scale the such that

• Define

• The bound becomes

Separable Case - Illustration

separable Case – IllustrationThe Perceptron will make more mistakes

Finding a separating hyperplance is more difficult

Inseparable Case

• Difficult problem implies a large value of• In this case the Perceptron will make a large

number of mistakes

Perceptron Algorithm

• Extremely easy to implement• Relative loss bounds for separable and inseparable

cases. Minimal assumptions (not iid)• Easy to convert to a well-performing batch algorithm

(under iid assumptions)

• Quantities in bound are not compatible : no. of mistakes vs. hinge-loss.

• Margin of examples is ignored by update• Same update for separable case and inseparable

Concluding Remarks

• Batch vs. Online– Two phases: Training and then Test– Single continues process

• Statistical Assumption– Distribution over examples– All sequences

• Conversions– Online -> Batch– Batch -> Online

introduction to machine learning fall 2013 perceptron (6) prof. koby crammer department of...

vector slide

difficult slide

value of slide

ss05 slide

compitetor slide

c04 slide

small cumulative loss

norm of hinge loss slide

Documents

linear classification: the perceptron - penn … ·...

koby and kylie co. retail catalog

by shimon gamel and koby desmond

banking crammer

adaptive regularization of weight...

info2014 - ivc - koby simana

jack koby koby keyes-summit_poster_highqualityprint pdf

03-kofaktor n crammer 2014-2015

bregman information bottleneck nips’03, whistler december...

insurance crammer

pranking with ranking koby crammer and yoram singer

civil law crammer

pranking with ranking - neural information processing...

grammar crammer (4 th grade grammar quiz) sarah hardy

koby - pintocruz.pt

introduction to machine learning fall 2013 decision trees...

1 second order learning koby crammer department of...

ip crammer presentation 2013

multilayer perceptron perceptron.pdf · multilayer...

will technology save us? - bethany koby