geometric view of data david kauchak cs 451 – fall 2013
TRANSCRIPT
![Page 1: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/1.jpg)
GEOMETRIC VIEW OF DATA
David KauchakCS 451 – Fall 2013
![Page 2: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/2.jpg)
Admin
Assignment 2 out
Assignment 1
Keep reading
Office hours today from: 2:35-3:20
Videos?
![Page 3: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/3.jpg)
Proper Experimentation
![Page 4: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/4.jpg)
Experimental setup
TrainingData
learnpast
How do we tell how well we’re doing?
predict
future
TestingData
(data with labels) (data without labels)
REAL WORLD USE OF ML ALGORITHMS
![Page 5: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/5.jpg)
Real-world classification
Google has labeled training data, for example from people clicking the “spam” button, but when new messages come in, they’re not labeled
![Page 6: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/6.jpg)
Classification evaluation
Data Label
0
0
1
1
0
1
0
Labele
d d
ata Use the labeled data we
have already to create a test set with known labels!
Why can we do this?
Remember, we assume there’s an underlying distribution that generates both the training and test examples
![Page 7: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/7.jpg)
Classification evaluation
Data Label
0
0
1
1
0
1
0
Training data
Testing data
Labele
d d
ata
![Page 8: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/8.jpg)
Classification evaluation
Data Label
0
0
1
1
0
1
0
Training data
Testing data
train a classifier
classifier
Labele
d d
ata
![Page 9: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/9.jpg)
Classification evaluation
Data Label
1
0
Pretend like we don’t know the labels
![Page 10: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/10.jpg)
Classification evaluation
Data Label
1
0
classifier
Pretend like we don’t know the labels
Classify
1
1
![Page 11: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/11.jpg)
Classification evaluation
Data Label
1
0
classifier
Pretend like we don’t know the labels
Classify
1
1
Compare predicted labels to actual labels
How could we score these for classification?
![Page 12: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/12.jpg)
Test accuracy
pre
dictio
n
Label
To evaluate the model, compare the predicted labels to the actual labels
Accuracy: the proportion of examples where we correctly predicted the label
![Page 13: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/13.jpg)
Proper testing
TrainingData
TestData
learn
Evaluate model
One way to do algorithm development- try out an algorithm- evaluated on test data- repeat until happy with
results
Is this ok?
No. Although we’re not explicitly looking at the examples, we’re still “cheating” by biasing our algorithm to the test data
![Page 14: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/14.jpg)
Proper testing
TestData
Evaluate model
Once you look at/use test data it is no longer test data!
So, how can we evaluate our algorithm during development?
![Page 15: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/15.jpg)
Development set
LabeledData
(data with labels)
AllTraining
Data
TestData
Training
DataDevelopment
Data
![Page 16: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/16.jpg)
Proper testing
TrainingData
DevelopmentData
learn
Evaluate model
Using the development data:- try out an algorithm- evaluated on
development data- repeat until happy with
results
When satisfied, evaluate on test data
![Page 17: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/17.jpg)
Proper testing
TrainingData
DevelopmentData
learn
Evaluate model
Using the development data:- try out an algorithm- evaluated on
development data- repeat until happy with
resultsAny problems with this?
![Page 18: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/18.jpg)
Overfitting to development dataBe careful not to overfit to the development data!
AllTraining
Data
Training
DataDevelopment
Data
Often we’ll split off development data this multiple times (in fact, on the fly), but you can still overfit, but this helps avoid it
![Page 19: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/19.jpg)
Pruning revisited
Unicycle
Mountain Normal
YES Terrain
Road Trail
NO
Snowy
Weather
Rainy Sunny
NO NOYES
Unicycle
Mountain Normal
YES NO
YES
Unicycle
Mountain Normal
YES Terrain
Road Trail
NOYESWhich should we pick?
![Page 20: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/20.jpg)
Pruning revisited
Unicycle
Mountain Normal
YES Terrain
Road Trail
NO
Snowy
Weather
Rainy Sunny
NO NOYES
Unicycle
Mountain Normal
YES NO
YES
Unicycle
Mountain Normal
YES Terrain
Road Trail
NOYESUse development data to decide!
![Page 21: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/21.jpg)
Machine Learning: A Geometric View
![Page 22: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/22.jpg)
Apples vs. Bananas
Weight
Color Label
4 Red Apple
5 Yellow
Apple
6 Yellow
Banana
3 Red Apple
7 Yellow
Banana
8 Yellow
Banana
6 Yellow
Apple
Can we visualize this data?
![Page 23: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/23.jpg)
Apples vs. Bananas
Weight
Color Label
4 0 Apple
5 1 Apple
6 1 Banana
3 0 Apple
7 1 Banana
8 1 Banana
6 1 Apple
Turn features into numerical values (read the book for a more detailed discussion of this)
Weight
Colo
r
0 10
0
1 B
A
A B
A
BA
We can view examples as points in an n-dimensional space where n is the number of features
![Page 24: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/24.jpg)
Examples in a feature space
label 1
label 2
label 3
feature1
feature2
![Page 25: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/25.jpg)
Test example: what class?
label 1
label 2
label 3
feature1
feature2
![Page 26: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/26.jpg)
Test example: what class?
label 1
label 2
label 3
feature1
feature2
closest to red
![Page 27: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/27.jpg)
Another classification algorithm?To classify an example d:
Label d with the label of the closest example to d in the training set
![Page 28: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/28.jpg)
What about his example?
label 1
label 2
label 3
feature1
feature2
![Page 29: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/29.jpg)
What about his example?
label 1
label 2
label 3
feature1
feature2
closest to red, but…
![Page 30: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/30.jpg)
What about his example?
label 1
label 2
label 3
feature1
feature2
Most of the next closest are blue
![Page 31: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/31.jpg)
k-Nearest Neighbor (k-NN)
To classify an example d: Find k nearest neighbors of d Choose as the label the majority label
within the k nearest neighbors
![Page 32: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/32.jpg)
k-Nearest Neighbor (k-NN)
To classify an example d: Find k nearest neighbors of d Choose as the label the majority label
within the k nearest neighbors
How do we measure “nearest”?
![Page 33: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/33.jpg)
Euclidean distance
In two dimensions, how do we compute the distance?
(a1, a2)
(b1, b2)
![Page 34: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/34.jpg)
Euclidean distance
In n-dimensions, how do we compute the distance?
(a1, a2, …, an)
(b1, b2,…,bn)
![Page 35: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/35.jpg)
Decision boundaries
label 1
label 2
label 3
The decision boundaries are places in the features space where the classification of a point/example changes
Where are the decision boundaries for k-NN?
![Page 36: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/36.jpg)
k-NN decision boundaries
k-NN gives locally defined decision boundaries between classes
label 1
label 2
label 3
![Page 37: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/37.jpg)
K Nearest Neighbour (kNN) Classifier
K = 1
![Page 38: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/38.jpg)
K Nearest Neighbour (kNN) Classifier
K = 1
![Page 39: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/39.jpg)
Choosing k
label 1
label 2
label 3
feature1
feature2
What is the label with k = 1?
![Page 40: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/40.jpg)
Choosing k
label 1
label 2
label 3
feature1
feature2
We’d choose red. Do you agree?
![Page 41: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/41.jpg)
Choosing k
label 1
label 2
label 3
feature1
feature2
What is the label with k = 3?
![Page 42: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/42.jpg)
Choosing k
label 1
label 2
label 3
feature1
feature2
We’d choose blue. Do you agree?
![Page 43: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/43.jpg)
Choosing k
label 1
label 2
label 3
feature1
feature2
What is the label with k = 100?
![Page 44: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/44.jpg)
Choosing k
label 1
label 2
label 3
feature1
feature2
We’d choose blue. Do you agree?
![Page 45: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/45.jpg)
The impact of k
What is the role of k? How does it relate to overfitting and underfitting?How did we control this for decision trees?
![Page 46: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/46.jpg)
k-Nearest Neighbor (k-NN)
To classify an example d: Find k nearest neighbors of d Choose as the class the majority class
within the k nearest neighbors
How do we choose k?
![Page 47: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/47.jpg)
How to pick k
Common heuristics: often 3, 5, 7 choose an odd number to avoid ties
Use development data
![Page 48: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/48.jpg)
k-NN variants
To classify an example d: Find k nearest neighbors of d Choose as the class the majority class
within the k nearest neighbors
Any variation ideas?
![Page 49: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/49.jpg)
k-NN variations
Instead of k nearest neighbors, count majority from all examples within a fixed distance
Weighted k-NN: Right now, all examples within examples
are treated equally weight the “vote” of the examples, so that
closer examples have more vote/weight often use some sort of exponential decay
![Page 50: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/50.jpg)
Decision boundaries for decision trees
label 1
label 2
label 3
What are the decision boundaries for decision trees like?
![Page 51: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/51.jpg)
Decision boundaries for decision trees
label 1
label 2
label 3
Axis-aligned splits/cuts of the data
![Page 52: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/52.jpg)
Decision boundaries for decision trees
label 1
label 2
label 3
What types of data sets will DT work poorly on?
![Page 53: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/53.jpg)
Problems for DT
![Page 54: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/54.jpg)
Decision trees vs. k-NN
Which is faster to train?
Which is faster to classify?
Do they use the features in the same way to label the examples?
![Page 55: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/55.jpg)
Decision trees vs. k-NN
Which is faster to train?k-NN doesn’t require any training!
Which is faster to classify?For most data sets, decision trees
Do they use the features in the same way to label the examples?k-NN treats all features equally! Decision trees “select” important features
![Page 56: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/56.jpg)
A thought experiment
What is a 100,000-dimensional space like?
You’re a 1-D creature, and you decide to buy a 2-unit apartment
2 rooms (very, skinny rooms)
![Page 57: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/57.jpg)
Another thought experiment
What is a 100,000-dimensional space like?
Your job’s going well and you’re making good money. You upgrade to a 2-D apartment with 2-units per dimension
4 rooms (very, flat rooms)
![Page 58: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/58.jpg)
Another thought experiment
What is a 100,000-dimensional space like?
You get promoted again and start having kids and decide to upgrade to another dimension.
Each time you add a dimension, the amount of space you have to work with goes up exponentially
8 rooms (very, normal rooms)
![Page 59: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/59.jpg)
Another thought experiment
What is a 100,000-dimensional space like?
Larry Page steps down as CEO of google and they ask you if you’d like the job. You decide to upgrade to a 100,000 dimensional apartment.
How much room do you have? Can you have a big party?2100,000 rooms (it’s very quiet and lonely…) = ~1030 rooms per person if you invited everyone on the planet
![Page 60: GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013](https://reader036.vdocuments.mx/reader036/viewer/2022062717/56649e575503460f94b4f865/html5/thumbnails/60.jpg)
The challenge
Our intuitions about space/distance don’t scale with dimensions!