comp 551 –applied machine learning lecture 8: instance ...jpineau/comp551/lectures/... · simple...

70
COMP 551 – Applied Machine Learning Lecture 8: Instance-based learning Associate Instructor: Herke van Hoof ([email protected]) Slides mostly by: Joelle Pineau ([email protected]) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor’s written permission.

Upload: others

Post on 21-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

COMP 551 – Applied Machine LearningLecture 8: Instance-based learning

Associate Instructor: Herke van Hoof

([email protected])

Slides mostly by: Joelle Pineau ([email protected])

Class web page: www.cs.mcgill.ca/~jpineau/comp551Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor’s written permission.

Page 2: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau2

Today’s quizQ1. Consider the following dataset.

Let “Day” and “Weather” be the input features and“GoHiking?” be the output.

a) What is the entropy of this set of examples?H(D) = ??

b) What is the information gain of feature “Weather”?IG = H(D) – H(D | Weather) = ??

c) What is the information gain of feature “Day”?IG = H(D) – H(D | Day) = ??

Q2. Give a decision tree that correctly represents the following Boolean function: Y = [X1 AND X2] OR [X2 AND X3](Many possible correct answers.)

COMP-551: Applied Machine Learning

Day Weather GoHiking?Mon Sunny NoTues Cloudy NoWed Rain NoThurs Rain NoFri Sunny NoSat Sunny NoSun Sunny Yes

Page 3: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau3

Today’s quizQ1. Consider the following dataset.

Let “Day” and “Weather” be the input features and“GoHiking?” be the output.

a) What is the entropy of this set of examples?H(D) = -(1/7)*log(1/7)/log(2)-(6/7)*log(6/7)/log(2)

b) What is the information gain of feature “Weather”?IG = H(D) – H(D | Weather)IG = H(D) – ((4/7)*(-(3/4)*log(3/4)/log(2)-(1/4)*log(1/4)/log(2)))

+ (2/7)*(0) + (1/7)*(0)

c) What is the information gain of feature “Day”?IG = H(D) – H(D | Day) = H(D) – 0 = H(D)

Q2. Give a decision tree that correctly represents the following Boolean function: Y = [X1 AND X2] OR [X2 AND X3](Many possible correct answers.)

COMP-551: Applied Machine Learning

Day Weather GoHiking?Mon Sunny NoTues Cloudy NoWed Rain NoThurs Rain NoFri Sunny NoSat Sunny NoSun Sunny Yes

X2

X1

X3

0 1

0

1

0 1

0 110

Page 4: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau4

A complete (artificial) example

• An artificial binary classification problem with two real-valued input

features:

COMP-551: Applied Machine Learning

Page 5: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau5

A complete (artificial) example

• An artificial binary classification problem with two real-valued input

features:

COMP-551: Applied Machine Learning

What label should we predict for this example?

*

Page 6: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau6

Parametric supervised learning

• Example: logistic regression. Input: dataset of labeled examples.

• From this, learn a parameter vector of a fixed size such that some

error measure based on the training data is minimized.

• These methods are called parametric, and main goal is to

summarize the data using the parameters.

– Parametric methods are typically global = one set of parameters for the entire data space.

COMP-551: Applied Machine Learning

Page 7: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau7

Instance based learning methods

• Key idea: just store all training examples < xi, yi >.

• When a query is made, locally compute the value y of new

instance based on the values of the most similar points.

COMP-551: Applied Machine Learning

Page 8: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau8

Instance based learning methods

• Key idea: just store all training examples < xi, yi >.

• When a query is made, locally compute the value y of new

instance based on the values of the most similar points.

• The regressor / classifier can now not be represented by a

fixed-sized vector: representation depends on dataset

COMP-551: Applied Machine Learning

Page 9: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau9

Instance based learning methods

• Key idea: just store all training examples < xi, yi >.

• When a query is made, locally compute the value y of new

instance based on the values of the most similar points.

• The regressor / classifier can now not be represented by a

fixed-sized vector: representation depends on dataset

• Different algorithms for computing the value of the new point

based on the existing values

COMP-551: Applied Machine Learning

Page 10: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau10

Non-parametric learning methods

• Key idea: just store all training examples < xi, yi >.

• When a query is made, computer the value of the new instance

based on the values of the closest (most similar) points.

• Requirements:

– A distance function.

– How many closest points (neighbors) to look at?

– How do we computer the value of the new point based on the existing values?

COMP-551: Applied Machine Learning

Page 11: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau11

Simple idea: Connect the dots!

COMP-551: Applied Machine Learning

What kind of distance metric?

• Euclidian distance

• Maximum/minimum di�erence along any axis

• Weighted Euclidian distance (with weights based on domain knowledge)

d(x,x0) =nX

j=1

uj(xj � x0j)

2

where xj denotes the value of the jth feature in the vector / instance x

• An arbitrary distance or similarity function d, specific for the applicationat hand (works best, if you have one)

COMP-652, Lecture 7 - September 27, 2012 15

Simple idea: Connect the dots!

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

73;.30*/7319<=-.

Wisconsin data set, classification

COMP-652, Lecture 7 - September 27, 2012 16

Page 12: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau12

Simple idea: Connect the dots!

COMP-551: Applied Machine Learning

What kind of distance metric?

• Euclidian distance

• Maximum/minimum di�erence along any axis

• Weighted Euclidian distance (with weights based on domain knowledge)

d(x,x0) =nX

j=1

uj(xj � x0j)

2

where xj denotes the value of the jth feature in the vector / instance x

• An arbitrary distance or similarity function d, specific for the applicationat hand (works best, if you have one)

COMP-652, Lecture 7 - September 27, 2012 15

Simple idea: Connect the dots!

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

73;.30*/7319<=-.

Wisconsin data set, classification

COMP-652, Lecture 7 - September 27, 2012 16

Page 13: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau13

Simple idea: Connect the dots!

COMP-551: Applied Machine Learning

Simple idea: Connect the dots!

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

Wisconsin data set, regression

COMP-652, Lecture 7 - September 27, 2012 17

One-nearest neighbor

• Given: Training data {(xi, yi)}mi=1, distance metric d on X .

• Learning: Nothing to do! (just store data)

• Prediction: for x ⌃ X– Find nearest training sample to x.

i⇤ ⌃ argmini

d(xi,x)

– Predict y = yi⇤.

COMP-652, Lecture 7 - September 27, 2012 18

Page 14: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau14

Simple idea: Connect the dots!

COMP-551: Applied Machine Learning

Simple idea: Connect the dots!

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

Wisconsin data set, regression

COMP-652, Lecture 7 - September 27, 2012 17

One-nearest neighbor

• Given: Training data {(xi, yi)}mi=1, distance metric d on X .

• Learning: Nothing to do! (just store data)

• Prediction: for x ⌃ X– Find nearest training sample to x.

i⇤ ⌃ argmini

d(xi,x)

– Predict y = yi⇤.

COMP-652, Lecture 7 - September 27, 2012 18

Page 15: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau15

One-nearest neighbor

• Given: Training data X, distance metric d on X.

• Learning: Nothing to do! (Just store the data).

COMP-551: Applied Machine Learning

Page 16: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau16

One-nearest neighbor

• Given: Training data X, distance metric d on X.

• Learning: Nothing to do! (Just store the data).

• Prediction: For x∈ X

Find nearest training sample xi.

i* = argmini d(xi, x)

Predict y = yi*

COMP-551: Applied Machine Learning

Page 17: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau17

What does the approximator look like?

• What do you think the decision boundary looks like?

COMP-551: Applied Machine Learning

Page 18: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau18

What does the approximator look like?

• Nearest-neighbor does not explicitly compute decision boundaries.

• But the effective decision boundaries are a subset of the Voronoi

diagram for the training data.

• Each decision boundary is a line segment that is equidistant

between two points of opposite classes.

COMP-551: Applied Machine Learning

What does the approximator look like?

• Nearest-neighbor does not explicitly compute decision boundaries

• But the e�ective decision boundaries are a subset of the Voronoi diagramfor the training data

Each line segment is equidistant between two points of opposite classes.

COMP-652, Lecture 7 - September 27, 2012 19

Distance metric is really important!

!

"#$%&'()*+,+-../0+-..10+234&56+78+9##&5 :3;*<3=5>?<;54+@5<&3'3(A+B@'45+//

9C@*'D<&'<*5+E';*<3=5+95*&'=;

BC$$#;5+*)5+'3$C*+D5=*#&;+F/0+F-0+GF3+<&5+*6#+4'H53;'#3<@A

!/I+J+F

//0+F/-K+0+!

-I+J+F

-/0+F--K+0+G!

LI+J+F

L/0+FL-K8

M35+=<3+4&<6+*)5+35<&5;*>35'()?#&+&5('#3;+'3+'3$C*+;$<=58

E';*J!'0!NK+IJF

'/O F

N/K-PJQF

'-O QF

N-K-E';*J!

'0!NK+I+JF

'/O F

N/K- P+JF

'-O F

N-K-

R)5+&5@<*'D5+;=<@'3(;+'3+*)5+4';*<3=5+H5*&'=+<SS5=*+&5('#3+;)<$5;8

"#$%&'()*+,+-../0+-..10+234&56+78+9##&5 :3;*<3=5>?<;54+@5<&3'3(A+B@'45+/-

TC=@'45<3+E';*<3=5+95*&'=

M*)5&+95*&'=;G

U 9<)<@<3#?';0+V<3W>?<;540+"#&&5@<*'#3>?<;54+

JB*<3S'@@P7<@*X0+9<5;Y+V'3(#+;%;*5HGK

! "

#####

$

%

&&&&&

'

(

)*

)

+)

*

*

!

"

!

!

!

#

!!

!$$

$!$

$$!

%&'()*'&'()*'%%&'(*'+

(%%&'(*'+

!

!!!!

!

!

!

"

"""

#

$$# ,

6)5&5

M&+5ZC'D<@53*@%0

Left: both attributes weighted equally; Right: second attributes weighted more

COMP-652, Lecture 7 - September 27, 2012 20

Page 19: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau19

What does the approximator look like?

• Example

COMP-551: Applied Machine Learning

Page 20: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau20

One-nearest neighbor

• Given: Training data X, distance metric d on X.

• Learning: Nothing to do! (Just store the data).

• Prediction: For x∈ X

Find nearest training sample xi.

i* = argmini d(xi, x)

Predict y = yi*

COMP-551: Applied Machine Learning

Page 21: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau21

What kind of distance metric?

COMP-551: Applied Machine Learning

Page 22: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau22

What kind of distance metric?

• Euclidean distance.

• Weighted Euclidean distance (with weights based on domain

knowledge): d(x, x’) = ∑j=1:m wj (xj – xj’)2

COMP-551: Applied Machine Learning

Page 23: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau23

What kind of distance metric?

• Euclidean distance.

• Weighted Euclidean distance (with weights based on domain

knowledge): d(x, x’) = ∑j=1:m wj (xj – xj’)2

• Maximum / minimum difference along any axis.

• An arbitrary distance or similarity function d, specific for the

application at hand (works best, if you have one.)

COMP-551: Applied Machine Learning

Page 24: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau24

Choice of distance metric is important!

COMP-551: Applied Machine Learning

What does the approximator look like?

• Nearest-neighbor does not explicitly compute decision boundaries

• But the e�ective decision boundaries are a subset of the Voronoi diagramfor the training data

Each line segment is equidistant between two points of opposite classes.

COMP-652, Lecture 7 - September 27, 2012 19

Distance metric is really important!

!

"#$%&'()*+,+-../0+-..10+234&56+78+9##&5 :3;*<3=5>?<;54+@5<&3'3(A+B@'45+//

9C@*'D<&'<*5+E';*<3=5+95*&'=;

BC$$#;5+*)5+'3$C*+D5=*#&;+F/0+F-0+GF3+<&5+*6#+4'H53;'#3<@A

!/I+J+F

//0+F/-K+0+!

-I+J+F

-/0+F--K+0+G!

LI+J+F

L/0+FL-K8

M35+=<3+4&<6+*)5+35<&5;*>35'()?#&+&5('#3;+'3+'3$C*+;$<=58

E';*J!'0!NK+IJF

'/O F

N/K-PJQF

'-O QF

N-K-E';*J!

'0!NK+I+JF

'/O F

N/K- P+JF

'-O F

N-K-

R)5+&5@<*'D5+;=<@'3(;+'3+*)5+4';*<3=5+H5*&'=+<SS5=*+&5('#3+;)<$5;8

"#$%&'()*+,+-../0+-..10+234&56+78+9##&5 :3;*<3=5>?<;54+@5<&3'3(A+B@'45+/-

TC=@'45<3+E';*<3=5+95*&'=

M*)5&+95*&'=;G

U 9<)<@<3#?';0+V<3W>?<;540+"#&&5@<*'#3>?<;54+

JB*<3S'@@P7<@*X0+9<5;Y+V'3(#+;%;*5HGK

! "

#####

$

%

&&&&&

'

(

)*

)

+)

*

*

!

"

!

!

!

#

!!

!$$

$!$

$$!

%&'()*'&'()*'%%&'(*'+

(%%&'(*'+

!

!!!!

!

!

!

"

"""

#

$$# ,

6)5&5

M&+5ZC'D<@53*@%0

Left: both attributes weighted equally; Right: second attributes weighted more

COMP-652, Lecture 7 - September 27, 2012 20

What does the approximator look like?

• Nearest-neighbor does not explicitly compute decision boundaries

• But the e�ective decision boundaries are a subset of the Voronoi diagramfor the training data

Each line segment is equidistant between two points of opposite classes.

COMP-652, Lecture 7 - September 27, 2012 19

Distance metric is really important!

!

"#$%&'()*+,+-../0+-..10+234&56+78+9##&5 :3;*<3=5>?<;54+@5<&3'3(A+B@'45+//

9C@*'D<&'<*5+E';*<3=5+95*&'=;

BC$$#;5+*)5+'3$C*+D5=*#&;+F/0+F-0+GF3+<&5+*6#+4'H53;'#3<@A

!/I+J+F

//0+F/-K+0+!

-I+J+F

-/0+F--K+0+G!

LI+J+F

L/0+FL-K8

M35+=<3+4&<6+*)5+35<&5;*>35'()?#&+&5('#3;+'3+'3$C*+;$<=58

E';*J!'0!NK+IJF

'/O F

N/K-PJQF

'-O QF

N-K-E';*J!

'0!NK+I+JF

'/O F

N/K- P+JF

'-O F

N-K-

R)5+&5@<*'D5+;=<@'3(;+'3+*)5+4';*<3=5+H5*&'=+<SS5=*+&5('#3+;)<$5;8

"#$%&'()*+,+-../0+-..10+234&56+78+9##&5 :3;*<3=5>?<;54+@5<&3'3(A+B@'45+/-

TC=@'45<3+E';*<3=5+95*&'=

M*)5&+95*&'=;G

U 9<)<@<3#?';0+V<3W>?<;540+"#&&5@<*'#3>?<;54+

JB*<3S'@@P7<@*X0+9<5;Y+V'3(#+;%;*5HGK

! "

#####

$

%

&&&&&

'

(

)*

)

+)

*

*

!

"

!

!

!

#

!!

!$$

$!$

$$!

%&'()*'&'()*'%%&'(*'+

(%%&'(*'+

!

!!!!

!

!

!

"

"""

#

$$# ,

6)5&5

M&+5ZC'D<@53*@%0

Left: both attributes weighted equally; Right: second attributes weighted more

COMP-652, Lecture 7 - September 27, 2012 20

Page 25: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau25

Distance metric tricks

• You may need to do feature preprocessing:

– Scale the input dimensions (or normalize them).

– Remove noisy and irrelevant inputs.

– Determine weights for attributes based on cross-validation (or information-theoretic methods).

COMP-551: Applied Machine Learning

Page 26: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau26

Distance metric tricks

• You may need to do feature preprocessing:

– Scale the input dimensions (or normalize them).

– Remove noisy and irrelevant inputs.

– Determine weights for attributes based on cross-validation (or information-theoretic methods).

• Distance metric is often domain-specific.

– E.g. string edit distance in bioinformatics.

– E.g. trajectory distance in time series models for walking data.

• Distance can be learned sometimes.

COMP-551: Applied Machine Learning

Page 27: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau27

k-nearest neighbor (kNN)

• In case of noise, a single bad

label can cause a patch to be

misclassified

• Safer to look at more than one

close point?

COMP-551: Applied Machine Learning

Page 28: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau28

k-nearest neighbor (kNN)

• Given: Training data X, distance metric d on X.

• Learning: Nothing to do! (Just store the data).

• Prediction:

– For x∈ X, find the k nearest training samples to x.

– Let their indices be i1, i2, …, ik.

– Predict: y = mean/median of {yi1, yi2, …, yik} for regression

y = majority of {yi1, yi2, …, yik} for classification, or empirical probability of each class.

COMP-551: Applied Machine Learning

Page 29: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau29

Classification, 2-nearest neighbor

COMP-551: Applied Machine Learning

Classification, 2-nearest neighbor, empirical distribution

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

$!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 23

Classification, 3-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

%!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 24

Page 30: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau30

Classification, 3-nearest neighbor

COMP-551: Applied Machine Learning

Classification, 2-nearest neighbor, empirical distribution

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

$!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 23

Classification, 3-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

%!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 24

Page 31: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau31

Classification, 5-nearest neighbor

COMP-551: Applied Machine Learning

Classification, 5-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

#!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 25

Classification, 10-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

!"!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 26

Page 32: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau32

Classification, 10-nearest neighbor

COMP-551: Applied Machine Learning

Classification, 5-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

#!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 25

Classification, 10-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

!"!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 26

Page 33: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau33

Classification, 15-nearest neighbor

COMP-551: Applied Machine Learning

Classification, 15-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

!#!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 27

Classification, 20-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

$"!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 28

Page 34: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau34

Classification, 20-nearest neighbor

COMP-551: Applied Machine Learning

Classification, 15-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

!#!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 27

Classification, 20-nearest neighbor

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

$"!73;.30*/7319<=-.>/,3;7

COMP-652, Lecture 7 - September 27, 2012 28

Page 35: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau35

Regression, 2-nearest neighbor

COMP-551: Applied Machine Learning

Regression, 2-nearest neighbor, mean prediction

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 29

Regression, 3-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 30

Page 36: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau36

Regression, 3-nearest neighbor

COMP-551: Applied Machine Learning

Regression, 2-nearest neighbor, mean prediction

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

COMP-652, Lecture 7 - September 27, 2012 29

Regression, 3-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

COMP-652, Lecture 7 - September 27, 2012 30

Page 37: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau37

Regression, 5-nearest neighbor

COMP-551: Applied Machine Learning

Regression, 5-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 31

Regression, 10-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 32

Page 38: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau38

Regression, 10-nearest neighbor

COMP-551: Applied Machine Learning

Regression, 5-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 31

Regression, 10-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 32

Page 39: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau39

What is the best regressor?

COMP-551: Applied Machine Learning

Regression, 5-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

COMP-652, Lecture 7 - September 27, 2012 31

Regression, 10-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

COMP-652, Lecture 7 - September 27, 2012 32

Regression, 2-nearest neighbor, mean prediction

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 29

Regression, 3-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e to r

ecurr

ence

COMP-652, Lecture 7 - September 27, 2012 30

Regression, 5-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

COMP-652, Lecture 7 - September 27, 2012 31

Regression, 10-nearest neighbor

10 12 14 16 18 20 22 24 26 280

10

20

30

40

50

60

70

80

nucleus size

tim

e t

o r

ecu

rre

nce

COMP-652, Lecture 7 - September 27, 2012 32

K=2 K=5

K=10

Page 40: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau40

Bias-variance trade-off

• What happens if k is low?

• What happens if k is high?

COMP-551: Applied Machine Learning

Page 41: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau41

Bias-variance trade-off

• What happens if k is low?

Very non-linear functions can be approximated, but we also capture the noise in the data. Bias is low, variance is high.

• What happens if k is high?

The output is much smoother, less sensitive to data variation. High bias, low variance.

• A validation set can be used to pick the best k.

COMP-551: Applied Machine Learning

Page 42: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau42

Limitations of k-nearest neighbor (kNN)

• A lot of discontinuities!

• Sensitive to small variations in the input data.

• Can we fix this but still keep it (fairly) local?

COMP-551: Applied Machine Learning

Page 43: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau43

k-nearest neighbor (kNN)

• Given: Training data X, distance metric d on X.

• Learning: Nothing to do! (Just store the data).

• Prediction:

– For x∈ X, find the k nearest training samples to x.

– Let their indices be i1, i2, …, ik.

– Predict: y = mean/median of {yi1, yi2, …, yik} for regression

y = majority of {yi1, yi2, …, yik} for classification, or empirical probability of each class.

COMP-551: Applied Machine Learning

Page 44: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau44

Distance-weighted (kernel-based) NN

• Given: Training data X, distance metric d on X, weighting

function w : R → R.

• Learning: Nothing to do! (Just store the data).

• Prediction:

– Given input x.

– For each xi compute wi = w(d(xi,x)).

– Predict: y = ∑i wiyi / ∑i wi .

COMP-551: Applied Machine Learning

Page 45: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau45

Distance-weighted (kernel-based) NN

• Given: Training data X, distance metric d on X, weighting

function w : R → R.

• Learning: Nothing to do! (Just store the data).

• Prediction:

– Given input x.

– For each xi compute wi = w(d(xi,x)).

– Predict: y = ∑i wiyi / ∑i wi .

• How should we weigh the distances?

COMP-551: Applied Machine Learning

Page 46: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau46

Some weighting functions

COMP-551: Applied Machine Learning

Distance-weighted (kernel-based) nearest neighbor

• Inputs: Training data {(xi, yi)}mi=1, distance metric d on X , weightingfunction w : R ⌥⌅ R.

• Learning: Nothing to do!

• Prediction: On input x,

– For each i compute wi = w(d(xi,x)).– Predict weighted majority or mean. For example,

y =

PiwiyiPiwi

• How to weight distances?

COMP-652, Lecture 7 - September 27, 2012 35

Some weighting functions

1

d(xi,x)

1

d(xi,x)21

c+ d(xi,x)2e�

d(xi,x)2

�2

COMP-652, Lecture 7 - September 27, 2012 36

Page 47: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau47

Gaussian weighting, small σ

COMP-551: Applied Machine Learning

Example: Gaussian weighting, small ⇤

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

;<+001<7!=319>*3?/73<.30*/7319>@-./=1*>/!A"&$#

COMP-652, Lecture 7 - September 27, 2012 37

Gaussian weighting, medium ⇤

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

;<+001<7!=319>*3?/73<.30*/7319>@-./=1*>/!A$

COMP-652, Lecture 7 - September 27, 2012 38

Page 48: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau48

Gaussian weighting, medium σ

COMP-551: Applied Machine Learning

Example: Gaussian weighting, small ⇤

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

;<+001<7!=319>*3?/73<.30*/7319>@-./=1*>/!A"&$#

COMP-652, Lecture 7 - September 27, 2012 37

Gaussian weighting, medium ⇤

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

;<+001<7!=319>*3?/73<.30*/7319>@-./=1*>/!A$

COMP-652, Lecture 7 - September 27, 2012 38

Page 49: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau49

Gaussian weighting, large σ

COMP-551: Applied Machine Learning

Gaussian weighting, large ⇤

!" !# $" $# %"

"

"&$

"&'

"&(

"&)

!

*+,-./0123/4,,56

7-7!.38+..179/4"6/:/.38+..179/4!6

;<+001<7!=319>*3?/73<.30*/7319>@-./=1*>/!A#

All examples get to vote! Curve is smoother, but perhaps too smooth.

COMP-652, Lecture 7 - September 27, 2012 39

Locally-weighted linear regression

• Weighted linear regression: di�erent weights in the error function fordi�erent points (see answer to homework 1)

• Locally weighted linear regression: weights depend on the distance tothe query point

• Uses a local linear fit (rather than just an average) around the querypoint

• If the distance metric is well tuned, it can lead to really good results (canrepresent non-linear functions easily and faithfully)

COMP-652, Lecture 7 - September 27, 2012 40

All examples get to vote! Curve is smoother, but perhaps too smooth?

Page 50: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau50

Scaling up

• kNN in high-dimensional feature spaces?

In high dim spaces, the distance between near and far points appears similar.

A few points (“hubs”) show up repeatedly in the top kNN [Radovanovic et al., 2009].

• kNN with larger number of datapoints?

COMP-551: Applied Machine Learning

Page 51: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau51

Scaling up

• kNN in high-dimensional feature spaces?

– In high dim spaces, the distance between points appears similar.

– A few points (“hubs”) show up repeatedly in the top kNN [Radovanovicet al., 2009].

• kNN with larger number of datapoints?

COMP-551: Applied Machine Learning

Page 52: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau52

Scaling up

• kNN in high-dimensional feature spaces?

– In high dim spaces, the distance between points appears similar.

– A few points (“hubs”) show up repeatedly in the top kNN [Radovanovicet al., 2009].

• kNN with larger number of datapoints?

– Can be implemented efficiently, O(log n) at retrieval time, if we use smart data structures:

• Condensation of the dataset (Use prototypes)• Hash tables in which the hashing function is based on the distance metric.• KD-trees (Tutorial: http://www.autonlab.org/autonweb/14665)

COMP-551: Applied Machine Learning

Page 53: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau53

Instance based learning

• Instance-based learning refers to techniques where previous samples are used directly to make predictions

• What makes instance based methods different?– Model is typically non-parametric (no fixed parameter vector)– Algorithms are typically lazy

COMP-551: Applied Machine Learning

Page 54: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau54

Lazy vs eager learning

• Lazy learning: Wait for query before generalization.

– E.g. Nearest neighbour.

• Eager learning: Generalize before seeing query.

– E.g. Logistic regression, LDA, decision trees, neural networks.

• Which is faster?

– Training time?

– Query answering time?

COMP-551: Applied Machine Learning

Page 55: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau55

Pros and cons of lazy and eager learning• Eager learners create global approximation.• Lazy learners create many local approximations.• If they use the same hypothesis space, a lazy learner can represent

more complex functions (e.g., consider H = linear function).

COMP-551: Applied Machine Learning

Page 56: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau56

Pros and cons of lazy and eager learning• Eager learners create global approximation.• Lazy learners create many local approximations.• If they use the same hypothesis space, a lazy learner can represent

more complex functions (e.g., consider H = linear function).

• Lazy learning has much faster training time.• Eager learner does the work off-line

COMP-551: Applied Machine Learning

Page 57: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau57

Pros and cons of lazy and eager learning• Eager learners create global approximation.• Lazy learners create many local approximations.• If they use the same hypothesis space, a lazy learner can represent

more complex functions (e.g., consider H = linear function).

• Lazy learning has much faster training time.

• Lazy learner typically has slower query answering time (depends on number of instances and number of features) and requires more memory (must store all the data).

• Eager learner does the work off-line

COMP-551: Applied Machine Learning

Page 58: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau58

Non-parametric method

• Representation for parametric method is specified in advance

– Fixed size representation

• Representation for non-parametric methods depends on dataset

– Size of representation typically linear in # of examples

COMP-551: Applied Machine Learning

Page 59: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau59

Pros and cons of non-parametric method

• Representation for parametric method is specified in advance

– Good if a good representation is known in advance

– Can easily leverage knowledge about structure

• Representation for non-parametric methods depends on dataset

– High resolution where much data available / decisions are complex

– If little is known data distribution (no good representation known)

– Still requires a good distance metric

• Non-parametric methods often require complex computations

• Non-parametric methods typically larger storage requirement

COMP-551: Applied Machine Learning

Page 60: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau60

Lazy / eager and non-parametric• Lazy / eager: Generalization before or after seeing query?• Parametric or not: fixed # of parameters or determined by data?

• Usually, parametric methods are also eager• Often, non-parametric are also lazy

– But consider decision trees!

COMP-551: Applied Machine Learning

Page 61: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau61

When to use instance-based learning• Instances map to points in Rn . Or else a given distance metric.

• Not too many attributes per instance (e.g. <20), otherwise all points look at a similar distance, and noise becomes a big issue.

• Not too many irrelevant attributes: easily fooled! (for most distance metrics.)

• Structure of model not known in advance

• Uneven spread of data: Provides variable resolution approximation (based on density of points).

COMP-551: Applied Machine Learning

Page 62: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau62

Application

Hays & Efros, Scene Completion Using Millions of Photographs, CACM, 2008.

http://graphics.cs.cmu.edu/projects/scene-completion/scene_comp_cacm.pdf

COMP-551: Applied Machine Learning

Page 63: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau63COMP-551: Applied Machine Learning

What you should know

• Difference between eager vs lazy learning.

• Key idea of non-parametric learning.

• The k-nearest neighbor algorithm for classification and

regression, and its properties.

• The distance-weighted NN algorithm

Page 64: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau64COMP-551: Applied Machine Learning

What you should know

• Difference between eager vs lazy learning.

• Key idea of non-parametric learning.

• The k-nearest neighbor algorithm for classification and

regression, and its properties.

• The distance-weighted NN algorithm and locally-weighted linear

regression.

Page 65: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau65

Project 1 follow-up

• Please follow instructions carefully!

– I spent ~5 hours since Friday cleaning up your submissions.

– Some submitted by email a few minutes/seconds late.

– Some submitted a single tar (w/report, predictions, code).

– Some did not include their collaborators as co-authors.

– Some could not compress their code sufficiently.

– SUBMIT EARLY! SUBMIT OFTEN!

COMP-551: Applied Machine Learning

Page 66: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau66

Project 2

• Available today. Due Oct. 23rd.

• Text classification task: – Devise a machine learning algorithm to analyze short conversations

and automatically classify them according to the language of the conversation.

– Conversations taken from your collected corpuses

COMP-551: Applied Machine Learning

Page 67: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau67

Tips for analyzing text

• Natural Language toolkit: http://www.nltk.org/

• Common features?

– Bag of words

COMP-551: Applied Machine Learning

Page 68: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau68

Tips for analyzing text

• Natural Language toolkit: http://www.nltk.org/

• Common features?

– Bag of words

– Term frequency – inverse document frequency (TF-IDF)

TF(t,d) = frequency of a word t in a document d

IDF(t,D) = measure of how much information the word t provides across corpus of documents D

TF-IDF(t,d,D) = TF(t,d) x IDF(t,D)

COMP-551: Applied Machine Learning

Page 69: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau69

Tips for analyzing text

• Natural Language toolkit: http://www.nltk.org/

• Common features?

– Bag of words

– Term frequency – inverse document frequency (TF-IDF)

– Hashing

=> Turn a word into a fixed-length vector using a hashing function.

– Word embeddings (more on this later in the course.)

• Dimensionality reduction: don’t consider all words, limit size of

hash table / embedding dimension. (more on this also later.)

COMP-551: Applied Machine Learning

Page 70: COMP 551 –Applied Machine Learning Lecture 8: Instance ...jpineau/comp551/Lectures/... · Simple idea: Connect the dots! COMP-551: Applied Machine Learning Simple idea: Connect

Joelle Pineau70

Locally weighted regression

COMP-551: Applied Machine Learning