unsupervised learning (examples) - computer science...

25
Unsupervised Learning (Examples) Javier B´ ejar cbea Term 2010/2011 Javier B´ ejar cbea Unsupervised Learning (Examples) Term 2010/2011 1 / 25

Upload: vuliem

Post on 29-Mar-2018

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Unsupervised Learning (Examples)

Javier Bejar cbea

Term 2010/2011

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 1 / 25

Page 2: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Outline

1 Iris

2 Voting Records

3 Mushroom

4 Image Segmentation

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 2 / 25

Page 3: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Iris

Iris

Differentiate among three species of flowers (Iris)

4 continuous attributes

Attributes: Measures of characteristics of the flowers

150 instances

3 classes

96 % accuracy for supervised learning

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 3 / 25

Page 4: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Iris

Iris

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 4 / 25

Page 5: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Iris

Iris - Expectation/maximization

We use the EM algorithm looking for 3 clusters

Clusters are relatively clear, accuracy is a little bit lower

0 1 2 <-- assigned to cluster

0 50 0 | Iris-setosa

50 0 0 | Iris-versicolor

14 0 36 | Iris-virginica

Cluster 0 <-- Iris-versicolor

Cluster 1 <-- Iris-setosa

Cluster 2 <-- Iris-virginica

Incorrectly clustered instances : 14.0 9.3333 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 5 / 25

Page 6: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Iris

Iris - Expectation/maximization

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 6 / 25

Page 7: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Iris

Iris - K-means

K-means algorithm looking of 3 clusters

Clusters are relatively clear, but cluster intersection affects prediction

0 1 2 <-- assigned to cluster

0 50 0 | Iris-setosa

47 0 3 | Iris-versicolor

14 0 36 | Iris-virginica

Cluster 0 <-- Iris-versicolor

Cluster 1 <-- Iris-setosa

Cluster 2 <-- Iris-virginica

Incorrectly clustered instances : 17.0 11.3333 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 7 / 25

Page 8: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Voting Records

Voting Records

Classify US senators by their voting

16 binary attributes

Attributes: Vote of the senator to different proposals (budget,immigration, taxes, military aid, ...)

435 instances

2 classes

96.3 % accuracy for supervised learning

Visualization of the data set is very difficult (binary attributes!)

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 8 / 25

Page 9: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Voting Records

Voting Records - PCA

PCA is used to obtain a new set of attributes

The data set does not holds the conditions to apply PCA (nongaussian data)

The 3 first components explain the 60 % of the variance (the first oneexplains 45 %, All are needed to reach 95 % of variance)

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 9 / 25

Page 10: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Voting Records

Voting records - PCA

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 10 / 25

Page 11: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Voting Records

Voting Records - Expectation-maximization

EM algorithm is applied looking for 2 clusters

Clusters are not very clear, the error is large

0 1 <-- assigned to cluster

44 223 | democrat

159 9 | republican

Cluster 0 <-- republican

Cluster 1 <-- democrat

Incorrectly clustered instances : 53.0 12.1839 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 11 / 25

Page 12: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Voting Records

Voting Records - K-means

K-means algorithm is applied looking for 2 clusters

The error is larger because of the intersection among clusters

0 1 <-- assigned to cluster

50 217 | democrat

157 11 | republican

Cluster 0 <-- republican

Cluster 1 <-- democrat

Incorrectly clustered instances : 61.0 14.023 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 12 / 25

Page 13: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom

Distinguish between poisonous and edible mushrooms

22 Attributes binary and nominal

Attributes: Visible characteristics of the mushrooms

About 8000 instances

2 classes

100 % accuracy for supervised learning

Visualization using the original attributes is difficult (binary andnominal attributes!)

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 13 / 25

Page 14: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - PCA

PCA is used to obtain a new set of attributes

The data set does not holds the conditions to apply PCA (nongaussian data)

The first 10 components explain only 50 % of the variance. Arenecessary all to explain 95 % of the variance (PCA has 59components).

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 14 / 25

Page 15: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - PCA

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 15 / 25

Page 16: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - PCA

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 16 / 25

Page 17: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - PCA

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 17 / 25

Page 18: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - PCA

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 18 / 25

Page 19: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - Expectation/maximization

EM algorithm is applied looking for 2 clusters

Clusters are not very clear, the error is large

Probably it is more interesting to look for more clusters and analyzethem (the data set has more structure than the supervised labelsshow)

0 1 <-- assigned to cluster

4208 0 | e

836 3080 | p

Cluster 0 <-- e

Cluster 1 <-- p

Incorrectly clustered instances : 836.0 10.2905 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 19 / 25

Page 20: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - Expectation/maximization + attributeselection

We are cheating :-)

A wrapper using decision trees is used to find the relevant attributes(5 relevant attributes)

EM algorithm is applied looking for 2 clusters

0 1 <-- assigned to cluster

4000 208 | e

528 3388 | p

Cluster 0 <-- e

Cluster 1 <-- p

Incorrectly clustered instances : 736.0 9.0596 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 20 / 25

Page 21: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Mushroom

Mushroom - K-means

K-means algorithm is applied looking for 2 clusters

The result is awful, intersection among classes is large, there is nogood partition of the data

0 1 <-- assigned to cluster

1234 2974 | e

2093 1823 | p

Cluster 0 <-- p

Cluster 1 <-- e

Incorrectly clustered instances: 3057.0 37.6292 %

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 21 / 25

Page 22: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Image Segmentation

Clustering for Image Processing

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 22 / 25

Page 23: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Image Segmentation

Clustering in image processing

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 23 / 25

Page 24: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Image Segmentation

Clustering for Image Processing

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 24 / 25

Page 25: Unsupervised Learning (Examples) - Computer Science …bejar/apren/docum/trans/09-clusterej-eng.pdf · Outline 1 Iris 2 Voting Records 3 Mushroom 4 Image Segmentation Javier B ejar

Image Segmentation

Clustering for Image Processing

Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 25 / 25