rugby players, ballet dancers, and the nearest neighbour classifier comp24111 lecture 2
TRANSCRIPT
![Page 1: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/1.jpg)
Rugby players, Ballet dancers,
and the
Nearest Neighbour Classifier
COMP24111 lecture 2
![Page 2: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/2.jpg)
A Problem to Solve with Machine Learning
Distinguish rugby players from ballet dancers.
You are provided with a few examples.Fallowfield rugby club (16).Rusholme ballet troupe (10).
TaskGenerate a program which will correctly classify ANY player/dancer in the world.
HintWe shouldn’t “fine-tune” our system too much so it only works on the local clubs.
![Page 3: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/3.jpg)
Taking measurements….
We have to process the people with a computer, so it needs to be in a computer-readable form.
What are the distinguishing characteristics?
1. Height2. Weight3. Shoe size4. Sex
![Page 4: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/4.jpg)
Class, or “label”
Terminology
“Features”
“Examples”
![Page 5: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/5.jpg)
The Supervised Learning Pipeline
ModelModelTesting
Data(no labels)
Testing Data
(no labels)
Training dataand labels
Training dataand labels
Predicted Labels
Learning algorithm
![Page 6: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/6.jpg)
Taking measurements….
Weight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
height
weight
Person
12345…1617181920…
![Page 7: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/7.jpg)
The Nearest Neighbour RuleWeight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
height
weight
Person
12345…1617181920…
“TRAINING” DATA
Who’s this guy? - player or dancer?
height = 180cmweight = 78kg
“TESTING” DATA
![Page 8: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/8.jpg)
The Nearest Neighbour RuleWeight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
height
weight
Person
12345…1617181920…
height = 180cmweight = 78kg
1. Find nearest neighbour
2. Assign the same class
“TRAINING” DATA
![Page 9: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/9.jpg)
Model(memorize the training data)
Model(memorize the training data)
Testing Data
(no labels)
Testing Data
(no labels)
Training dataTraining data
Predicted Labels
Learning algorithm(do nothing)
Supervised Learning Pipeline for Nearest Neighbour
![Page 10: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/10.jpg)
The K-Nearest Neighbour ClassifierWeight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
Person
12345…1617181920…
Testing point x
For each training datapoint x’
measure distance(x,x’)
End
Sort distances
Select K nearest
Assign most common class
“TRAINING” DATA
height
weight
![Page 11: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/11.jpg)
Quick reminder: Pythagoras’ theorem
. . .
measure distance(x,x’)
. . .
height
weight
i
ii xxxx 2)'()',(distance
a.k.a. “Euclidean” distancea
cb 22
222
.... bacSo
cba
![Page 12: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/12.jpg)
The K-Nearest Neighbour ClassifierWeight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
Person
12345…1617181920…
Testing point x
For each training datapoint x’
measure distance(x,x’)
End
Sort distances
Select K nearest
Assign most common class
“TRAINING” DATA
height
weight
Seems sensible.
But what are the disadvantages?
![Page 13: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/13.jpg)
The K-Nearest Neighbour ClassifierWeight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
Person
12345…1617181920…
“TRAINING” DATA
height
weight
Here I chose k=3.
What would happen if I chose k=5?
What would happen if I chose k=26?
![Page 14: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/14.jpg)
The K-Nearest Neighbour ClassifierWeight
63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…
Height
190cm185cm202cm180cm174cm
150cm145cm130cm163cm171cm
Person
12345…1617181920…
“TRAINING” DATA
height
weight
Any point on the left of this “boundary” is closer to the red circles.
Any point on the right of this “boundary” is closer to the blue crosses.
This is called the “decision boundary”.
![Page 15: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/15.jpg)
Where’s the decision boundary?
height
weight
Not always a simple straight line!
![Page 16: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/16.jpg)
Where’s the decision boundary?
height
weight
Not always contiguous!
![Page 17: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/17.jpg)
The most important concept in Machine Learning
![Page 18: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/18.jpg)
Looks good so far…
The most important concept in Machine Learning
![Page 19: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/19.jpg)
Looks good so far…
Oh no! Mistakes!What happened?
The most important concept in Machine Learning
![Page 20: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/20.jpg)
Looks good so far…
Oh no! Mistakes!What happened?
We didn’t have all the data.
We can never assume that we do.
This is called “OVER-FITTING”to the small dataset.
The most important concept in Machine Learning
![Page 21: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/21.jpg)
So, we have our first “machine learning” algorithm… ?
Testing point x
For each training datapoint x’
measure distance(x,x’)
End
Sort distances
Select K nearest
Assign most common class
The K-Nearest Neighbour Classifier
Make your own notes on its advantages / disadvantages.
I will ask for volunteers next time we meet…..
![Page 22: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/22.jpg)
Model(memorize the training data)
Model(memorize the training data)
Testing Data
(no labels)
Testing Data
(no labels)
Training dataTraining data
Predicted Labels
Learning algorithm(do nothing)
Pretty dumb! Where’s the learning!
![Page 23: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/23.jpg)
Now, how is this problem like
handwriting recognition?
height
weight
![Page 24: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/24.jpg)
Let’s say the measurements are pixel values.
pixel 2 value
pixel 1 value
25500
255
(190, 85)
A two-pixel image
![Page 25: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/25.jpg)
Three dimensions…A three-pixel image
This 3-pixel image is represented bya SINGLE point in a 3-D space.
pixel 2
pixel 1
pixel 3
(190, 85, 202)
![Page 26: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/26.jpg)
pixel 2
pixel 1
pixel 3
(190, 85, 202)
A three-pixel image
(25, 150, 75)
Another 3-pixel image
Straight line distance between them?
Distance between images
![Page 27: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/27.jpg)
pixel 2
pixel 1
pixel 3
(190, 85, 202)
A three-pixel image
A four-pixel image.
A five-pixel image
4-dimensional space? 5-d? 6-d?
![Page 28: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/28.jpg)
A four-pixel image. A different four-pixel image.
Assuming we read pixels in a systematic manner, we can now represent any image as a single point in a high dimensional space.
(190, 85, 202, 10)Same 4-dimensional vector!
(190, 85, 202, 10)
![Page 29: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/29.jpg)
16 x 16 pixel image. How many dimensions?
![Page 30: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/30.jpg)
?
We can measure distance in 256 dimensional space.
256
1
2)'()',(distancei
iii xxxx
![Page 31: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/31.jpg)
maybe
maybe
probablynot
Which is the nearest neighbour to our ‘3’ ?
![Page 32: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/32.jpg)
AT&T Research LabsThe USPS postcode reader – learnt from examples.
FAST recognition
NOISE resistant, can generalise to future
unseen patterns
![Page 33: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2](https://reader030.vdocuments.mx/reader030/viewer/2022032600/56649da75503460f94a93a9e/html5/thumbnails/33.jpg)
Your lab exercise
Use K-NN to recognise handwritten USPS digits.
Biggest mistake by students in last year’s class?
Final lab session before reading week.(see website for details)
i.e. you have 4 weeks.
…starting 2 weeks from now!