machine learning ii: unsupervised learning€¦ · 2 our agenda today unsupervised learning and...

47
Machine Learning II: Unsupervised Learning Machine Learning II: Unsupervised Learning CSEP 573 © CSE AI Faculty

Upload: others

Post on 03-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

Machine Learning II:Unsupervised LearningMachine Learning II:

Unsupervised Learning

CSEP 573

© CSE AI Faculty

Page 2: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

2

Our agenda todayOur agenda todayUnsupervised Learning and Applications

• Clustering– Application: Image Segmentation

• Density Estimation and EM algorithm• Dimensionality Reduction

– Principal Component Analysis (PCA)– Applications: Image Compression, Face Recognition

Guest Lecture by Rawichote Chalodhorn: Applications of Learning in Robotics

Page 3: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

3

Motivation: Image Segmentation in Computer Vision

Motivation: Image Segmentation in Computer Vision

Goal: Partition an image into its constituent “objects”

Some slides adapted from Steve Seitz, Linda Shapiro

Page 4: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

4

Idea: Image histogramsIdea: Image histogramsHow many “orange” pixels are in this image?

• Look at the histogram• A histogram counts the number of occurrences of each color– Given an image– The histogram is

i.e., for each color value c on the x-axis, plot # of pixels with that color on y-axis

Page 5: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

5

Example Histogram of a Grayscale Image

Example Histogram of a Grayscale Image

How Many Modes Are There?• Easy to see, hard to compute

ImageHistogram

Intensity bins

Page 6: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

6

Histogram-based segmentationHistogram-based segmentation

Here’s what our image looks like if we use two colors (intensities)

Idea: Break the image into K regions (segments) by• reducing the number of colors to K • assigning each pixel to the closest color

K = 2

Page 7: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

7

ClusteringClustering• Idea in previous slide can be formalized as clustering• Problem: Given unlabeled data points {p1,p2,…,pN}, assign each data point pj to one of K clusters

• points within a cluster are “similar” (according to some metric)

• Example of unsupervised learning (no label given)

2D data points {p1, p2,…} K = 3 clusters

Page 8: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

8

Why Clustering?Why Clustering?• Lots of Applications!

• Biology: Discovering gene clusters with similar expression patterns, grouping homologous DNA sequences, etc.

• Marketing: Grouping customers with similar traits for segmenting the market, product positioning etc.

• Vision: Image segmentation, feature learning for recognition,…

• Search result grouping (e.g, clusty.com)• Social network analysis (discovering user

communities with similar interests)• Crime analysis (identification of “hot spots”)• Many more!

Page 9: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

9

Clustering: The ProblemClustering: The Problem

Given: Unlabeled dataGoal: Assign each point to the cluster it is most similar toSuppose we are given the number of clusters K (= 3 here)

How do we assign each point to a cluster?

Page 10: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

10

Suppose you are given the cluster centers ci

Suppose you are given the cluster centers ci

Q: how do you assign points to a cluster?A: for each point p,

• Compute distance to cluster centers• Choose the closest ci

Page 11: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

11

Suppose you are given the cluster centers ci

Suppose you are given the cluster centers ci

But wait…you are not given the cluster centers!How do you find them?

Page 12: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

12

Finding the cluster centerFinding the cluster center

Given a cluster of points, we can easily compute its center(How?)

Page 13: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

13

A chicken-or-egg problem?A chicken-or-egg problem?Given cluster centers, we can assign pointsTo find centers, we need points assigned to a cluster

Page 14: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

14

A way out of the impasseA way out of the impasse

Why not alternate?(between finding centers and

assigning points)

Page 15: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

15

Alternate between 2 stepsAlternate between 2 steps

I. Given current estimate of cluster centers ci:Assign each point p to closest ci

II. Given current assignment of points to clusters:Choose ci to be the mean of all the points in the cluster

Page 16: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

16

K-means clusteringK-means clusteringAlgorithm

1. Randomly initialize the cluster centers, c1,...,cK

2. Determine cluster membership • For each point p, find the closest ci

• Put p into cluster i3. Re-estimate cluster centers

• Set ci to be the mean of points in cluster i4. If ci have changed, go to 2 else done.

Java demo: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

Page 17: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

17

0

1

2

3

4

5

0 1 2 3 4 5

c1

c2

c3

K-means clustering exampleK-means clustering exampleRandomly initialize the cluster centers

Page 18: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

18

0

1

2

3

4

5

0 1 2 3 4 5

c1

c2

c3

K-means clustering exampleK-means clustering exampleDetermine cluster membership

Page 19: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

19

0

1

2

3

4

5

0 1 2 3 4 5

c1

c2

c3

K-means clustering exampleK-means clustering exampleRe-estimate cluster centers

Page 20: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

20

0

1

2

3

4

5

0 1 2 3 4 5

Result of first iteration

c2 c3

K-means clustering exampleK-means clustering example

c1

Page 21: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

21

0

1

2

3

4

5

0 1 2 3 4 5

c3

Second iteration

c1

c2

K-means clustering exampleK-means clustering example

Page 22: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

22

0

1

2

3

4

5

0 1 2 3 4 5

c1

c2

c3

Result of second iterationK-means clustering exampleK-means clustering example

Page 23: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

23

K-means clusteringK-means clusteringProperties

• Will always converge to some solution• Can be a “local minimum”

• does not always find the global minimum of objective function:

Page 24: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

24

K-means as probability density estimationK-means as probability density estimationK-means can be formalized as estimating the unknown probability density of a data set

• Model data as a mixture of K Gaussians• Estimate not only means but also (co)variances

Data

μA

2σA

∑=

Σ=K

jjjj GCpp

1),()()( μxModel

Page 25: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

25

K-Means and the EM AlgorithmK-Means and the EM Algorithm• The Expectation Maximization (EM) Algorithm is a general algorithm for unsupervised learning when there are hidden variables (e.g., clusters, non-evidence nodes in Bayesian networks, etc.)

• Like K-means, it involves iterating between 2 steps:• E (“expectation”) step that estimates posterior probabilities of hidden variables

• M (“maximization”) step that uses the result of E step to update model parameters

• Each iteration improves likelihood of data under the model (or keeps it the same)

• Guaranteed to converge (perhaps to local maximum)

Page 26: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

26

Not to be confused with…

The Expectation Minimization Algorithm

Not to be confused with…

The Expectation Minimization Algorithm

Page 27: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

27

Density estimation using EMDensity estimation using EM• EM for Gaussian mixtures (similar to K-means):

• Initialize K clusters: C1, …, CK (μj, Σj) and p(Cj) for each cluster j

• Repeat until convergence:– Estimate which cluster each data point belongs to

– Re-estimate cluster parameters

)|( ij xCp Expectation step

Maximization step)(),,( jjj CpΣμ

Page 28: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

28

EM algorithm: The detailsEM algorithm: The details

E step: Compute probability of membership in cluster based on output of previous M step (p(xi|Cj) = Gaussian(μj, Σj))

M step: Re-estimate cluster parameters based on output of E step

∑ ⋅

⋅=

⋅=

jjji

jji

i

jjiij CpCxp

CpCxpxp

CpCxpxCp

)()|()()|(

)()()|(

)|(

∑∑ −⋅−⋅

iij

Tjiji

iij

j xCp

xxxCp

)|(

)()()|( μμ

∑∑ ⋅

=

iij

ii

ij

j xCp

xxCp

)|(

)|(μ

N

xCpCp i

ij

j

∑=

)|()(

Page 29: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

29

Results from the EM algorithmResults from the EM algorithm

Input data:

μA

μB2σA

2σB

Page 30: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

Suppose we are not interested in density estimation but want to reduce

the dimensionality of our data

Suppose we are not interested in density estimation but want to reduce

the dimensionality of our data

Example application: Image compression

Page 31: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

31

Redundancy in ImagesRedundancy in ImagesMost natural images (e.g., images of faces) are highly redundant

• Nearby pixels tend to have similar intensities and are therefore highly correlated

• Why?• Due to physical regularities of natural structures generating the image

Can we use unsupervised learning to capture and reduce this redundancy?

Page 32: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

32

A Simple 2D ExampleA Simple 2D Example

Suppose pixel 1 and pixel 2 are two neighboring pixels• What does the plot above suggest?• The two pixels are highly correlated for this data set of images

This and next few slides adapted from Steve Seitz, Linda Shapiro

Pixel 1

Pixel 2

Page 33: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

33

Linear subspacesLinear subspaces

Suppose we fit a line v1 Let v2 be orthogonal to v1

Convert an input x into v1, v2coordinates

What does the v2 coordinate measure?- distance to line (position along v2 axis)- near 0 for these pts

What does the v1 coordinate measure?- position along v1 axis- use it to specify which point it is

Pixel 1

Pixel 2

Page 34: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

34

Dimensionality reductionDimensionality reduction

• We can represent the points with only their v1 coordinates– since v2 coordinates are all essentially 0

– Reduce dimensionality of data from 2D to 1D• This makes it cheaper to store and compare points• Bigger deal for higher dimensional inputs (like images!)

Pixel 1

Pixel 2

Page 35: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

35

How do we find v1, v2, … ?How do we find v1, v2, … ?

Consider the variation along some direction v for all of the N points:

What unit vector v maximizes var?

Pixel 1

Pixel 2

v2 is then the unit vector orthogonal to v1

arg1/N

Page 36: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

36

How do we find v1, v2, … ?How do we find v1, v2, … ?

We want to find a unit vector v that maximizes vTA v

A = Covariance matrix of data points

2Pixel 1

/N

/N

/N

/N

Pixel 2

Page 37: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

37

Finding v1 and v2: The MathFinding v1 and v2: The Math

v1 = argmaxv (vTA v) subject to vTv = 1Using Lagrange multiplier method,v1 = argmaxv [vTA v – λ(vTv – 1)]Setting derivative wrt v to 0, we get:Av = λv

Pixel 1

Pixel 2

Thus, v1 is eigenvector of A with largest eigenvalue λ1v2 is eigenvector of A with smaller eigenvalue λ2

Page 38: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

38

Principal Component Analysis (PCA)Principal Component Analysis (PCA)Suppose each of the N data points is L-dimensional

• Form L x L data covariance matrix A

• Compute eigenvectors of A– Eigenvectors of A define a new coordinate system

that is a rotation of the original coordinate system– Eigenvector with largest eigenvalue captures the most

variation among training vectors x– Eigenvector with smallest eigenvalue has least

variation

∑=

−−=N

i

TiiN

A1

))((1 xxxx

Page 39: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

39

Principal Component Analysis (PCA)Principal Component Analysis (PCA)

We can compress the data by only using the top few eigenvectors with largest eigenvalues– corresponds to choosing a “linear subspace”of the original data space

– represent points on a line, plane, “hyper-plane”

– these eigenvectors are known as principal component vectors

– procedure is known as Principal Component Analysis (PCA)

Page 40: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

40

Back in the vector space of faces…Back in the vector space of faces…

An image is a point in a high dimensional space• A P x Q pixels image is a point in RPQ

• Vectors in this space behave similarly to the data points in our 2D case

+=

Page 41: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

41

Dimensionality reductionDimensionality reduction

The space of all faces is a “subspace” of the space of all images• Suppose this subspace is K dimensional, K << PQ• We can find such a subspace using PCA• This is like fitting a “hyper-plane” to the set of faces

– spanned by eigenvectors v1, v2, ..., vK

– any face

Page 42: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

42

EigenfacesEigenfacesPCA extracts the eigenvectors v1, v2, v3, ... vK of covariance matrix A

Each one of these vectors is a direction in face space• what do these look like?

The eigenvectors for face images are called “eigenfaces”

Page 43: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

43

Choosing the reduced dimension KChoosing the reduced dimension K

K PQi =

eigenvalues

How many eigenfaces to use?Look at the decay of the eigenvalues

• the eigenvalue tells you the amount of variance “in the direction” of that eigenface

• ignore eigenfaces with low variance

# pixels in the image

Page 44: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

44

Application 1: Image CompressionApplication 1: Image CompressionThe eigenfaces v1, ..., vK span the space of faces

• An image is converted to eigenface coordinates using dot products (“projection”):

Reconstructed face

Compressed representation of face (K usually much smaller than PQ)

Input image (P x Q pixels)

Page 45: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

45

[Turk & Pentland 01]

Page 46: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

46

Application 2: Face RecognitionApplication 2: Face RecognitionAlgorithm:

1. Given a set of images with labels (e.g., names)• Run PCA and compute K eigenfaces• Calculate and store the K coefficients for

each image with its label

2. Given a new image x, calculate K coefficients

3. Verify that x is a face

4. If it is a face, who is it?• Find label of closest face in database• Nearest-neighbor in K-dimensional space

Page 47: Machine Learning II: Unsupervised Learning€¦ · 2 Our agenda today Unsupervised Learning and Applications • Clustering – Application: Image Segmentation • Density Estimation

BreakBreak

Next: Applications in Robotic Learning (Guest Lecture by Dr. Chalodhorn)