introduction to machine learning for microarray analysis jennifer listgarten october 2001 (for the...

88
Introduction to Introduction to Machine Learning for Machine Learning for Microarray Analysis Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Upload: roy-willis

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Introduction to Machine Introduction to Machine Learning for Microarray Learning for Microarray AnalysisAnalysis

Jennifer ListgartenOctober 2001

(for the non-computer scientist)

Page 2: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• Clustering (Thoroughly)Clustering (Thoroughly)

• Principal Components Analysis (Briefly)Principal Components Analysis (Briefly)

• Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

Perhaps another time:Perhaps another time:

• ‘‘Supervised’ vs. ‘Unsupervised’ LearningSupervised’ vs. ‘Unsupervised’ Learning

• Neural Networks and Support Vector MachinesNeural Networks and Support Vector Machines

Page 3: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• Clustering (Thoroughly)Clustering (Thoroughly)

• Principal Components Analysis (Briefly)Principal Components Analysis (Briefly)

• Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

Page 4: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

What is Machine LearningWhat is Machine Learning

Definition: The ability of a computer to recognize patterns that have occurred repeatedly and improve its performance based on past experience.

Page 5: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Questions for Machine Questions for Machine LearningLearning

• Which genes are co-regulated?Which genes are co-regulated?

• Which genes have similar functional Which genes have similar functional roles?roles?

• Do certain gene profiles correlate Do certain gene profiles correlate with diseased patients?with diseased patients?

• (Which genes are upregulated/downregulated?)(Which genes are upregulated/downregulated?)

Page 6: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

The Data: How to think The Data: How to think about itabout it

In Machine Learning, each data point is a vector.

Example:

Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

Expression Ratio for Gene 3

Page 7: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

Each vector ‘lives’ in a high-dimensional space.

N is normally larger than 2, so we can’t visualize the data.

Expression Ratio for Gene 3

Page 8: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Our GoalOur Goal

Tease out the structure of our data from the high-dimensional space in which it lives.

Breast cancer patients

Healthy patients

Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

Page 9: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Two ways to think of data Two ways to think of data for Microarray for Microarray ExperimentsExperiments

1. All genes for one patient make a vector:Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

Page 10: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Two ways to think of data Two ways to think of data for Microarray for Microarray ExperimentsExperiments

1. All genes for one patient make a vector:Patient_X= (gene_1, gene_2, gene_3, …, gene_N)

2. All experiments for one gene make a vector:Gene_X= (experiment_1, experiment_2 …, experiment_N)

Page 11: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• ClusteringClustering

– Hierarchical ClusteringHierarchical Clustering

– K-means ClusteringK-means Clustering

– Stability of ClustersStability of Clusters

• Principal Components AnalysisPrincipal Components Analysis

• Self-Organizing MapsSelf-Organizing Maps

Page 12: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Clustering DataClustering Data

Want ‘similar’ data to group together.Want ‘similar’ data to group together.

Problems:Problems:

• Don’t know which definition of ‘similar’ to Don’t know which definition of ‘similar’ to use in order to extract usefuluse in order to extract useful information.information.

• Without external validation, difficult to tell Without external validation, difficult to tell if clustering is if clustering is meaningful.meaningful.

• How many clusters? How many clusters?

Page 13: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

SimilaritySimilarity

• MetricMetric: Formal name for ‘: Formal name for ‘measure of measure of

similaritysimilarity’ between 2 points.’ between 2 points.

• Every clustering algorithm (hierarchical, k-Every clustering algorithm (hierarchical, k-

means, means, etc.etc.) needs to decide on a metric.) needs to decide on a metric.

• Can argue in favour of various metrics, but Can argue in favour of various metrics, but no correct answer. no correct answer.

Page 14: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Some MetricsSome Metrics

• Euclidean distance: Euclidean distance:

• Correlation based: Correlation based:

• Mutual Information based: Mutual Information based:

i

ii YX 2)(

i Y

ii

X

ii YYXX

N 1

)()(

),(log),(

YpXp

YXpYXp

X and Y here are two vectors (eg. two patients)

Page 15: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to general Machine LearningIntroduction to general Machine Learning

• ClusteringClustering

– Hierarchical ClusteringHierarchical Clustering

– K-means ClusteringK-means Clustering

– Stability of ClustersStability of Clusters

• Principal Components AnalysisPrincipal Components Analysis

• Self-Organizing MapsSelf-Organizing Maps

Page 16: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of Hierarchical Outline of Hierarchical ClusteringClustering

1. Start by making 1. Start by making everyevery data point its own data point its own cluster.cluster.

2. Repeatedly combine ‘closest’ clusters until 2. Repeatedly combine ‘closest’ clusters until only one is left.only one is left.

(example to follow)(example to follow)

Page 17: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Initially each datum is its own cluster.

Simple Example - Step 1

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

High-dimensional point

Page 18: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Combine the two closest clusters.

Simple Example - Step 2

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

E F DENDOGRAM

Page 19: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Again: Combine the next two closest clusters.

Simple Example - Step 3

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

E F B C DENDOGRAM

Page 20: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

And again...

Simple Example - Step 4

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

D E F B C DENDOGRAM

Page 21: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

And again...

Simple Example - Step 5

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

D E F B C A DENDOGRAM

Page 22: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

And again...

Simple Example - Step 6

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

G D E F B C A DENDOGRAM

Page 23: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

And again...

Simple Example - Step 7

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

G D E F B C A DENDOGRAM

Page 24: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

And again...

Simple Example - Step 7

Legend: Data point (patient)

One cluster

A

G

C

E

B

D

F

G D E F B C A

Metric Scale

DENDOGRAM

Page 25: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Digression:‘Distance’ Digression:‘Distance’ between clustersbetween clusters

3 common ways:3 common ways:

1. Single-Linkage

2. Complete-Linkage

3. Average-Linkage

Page 26: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

What we get out of HCWhat we get out of HC

• A A hierarchicalhierarchical set of clusters. set of clusters.

• A dendogram showing which data A dendogram showing which data points are most points are most closely relatedclosely related, as , as defined by the metric.defined by the metric.

A B C D E F G A B C D E F G

Page 27: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

What we get out of HCWhat we get out of HC

• Can we tell how data points are Can we tell how data points are related by looking at the related by looking at the horizontalhorizontal positions of the data pointspositions of the data points......? ?

• Must be careful about interpretation Must be careful about interpretation of the dendogram - example to of the dendogram - example to follow.follow.

A B C D E F G

Page 28: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

G D E F B C A

Notice that we can swap branches while maintaining the tree structure.

Page 29: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

G D F E B C A

Notice that we can swap branches while maintaining the tree structure.

Page 30: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

G D F E B C A

Again...

Page 31: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

B C A G D F E

Again...

Page 32: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

B C A G D F E

Again...

How many ways to swap branches if there are N data points?

Page 33: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

B C A G D F E

Again...

How many ways to swap branches if there are N data points?

2N-1

For N=100

2N-1=1.3 x 1030

Page 34: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Two data points that were close together in one tree, may be far apart in another.

G D F E B C A B C A G D F E

1. G and A far apart

2. G and A close

1 2

Page 35: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Two data points that were close together in one tree, may be far apart in another.

G D F E B C A B C A G D F E

1. G and A far apart

2. G and A close

1 2

There is a way to help overcome the arbitrariness of the branches: Self-Organizing Maps - SOMs - discuss later

Page 36: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Lesson LearnedLesson Learned

Be careful not to overinterpret Be careful not to overinterpret the results of hierarchical the results of hierarchical clustering (along the horizontal clustering (along the horizontal axis).axis).

Page 37: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

What is HC used for?What is HC used for?

• Typically, grouping genes that are co-Typically, grouping genes that are co-regulated.regulated.

(Could use for grouping patients too.)(Could use for grouping patients too.)

• While useful, it is a While useful, it is a relatively simplerelatively simple, , unsophisticated tool. unsophisticated tool.

• It is more of a It is more of a visualization toolvisualization tool, rather , rather than a mathematical model.than a mathematical model.

A B C D E F G

Page 38: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• ClusteringClustering

– Hierarchical ClusteringHierarchical Clustering

– K-means ClusteringK-means Clustering

• Principal Components AnalysisPrincipal Components Analysis

• Self-Organizing MapsSelf-Organizing Maps

Page 39: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

K-Means ClusteringK-Means Clustering

GoalGoal: Given a desired : Given a desired number of number of clustersclusters,,

• find out the find out the clustercluster centerscenters

• find out which data find out which data point belongs to point belongs to each clustereach cluster

Page 40: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Legend: Data point (patient) Cluster centre

Must specify that we want 3 clusters.

Page 41: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of K-Means Clustering

•Step 1 - Decide how many clusters (let’s

say 3).

•Step 2 - Randomly choose 3 cluster centers.

Page 42: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of K-Means Clustering

•Step 3 - Choose a metric.

•Step 4 - Assign each point to the cluster that it is ‘closest’ to (according to metric).

Page 43: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of K-Means Clustering•Step 5 - Recalculate cluster centres using means of points that belong to a cluster.

•Step 6 - Repeat until convergence (or fixed number of steps, etc.). Newly calculated

cluster centres

Page 44: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of K-Means Clustering

Another step.... assign points to clusters.

Reassign Points

Page 45: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of K-Means Clustering

And the final step.... reposition the means.

Newly calculated cluster centres

Page 46: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Outline of K-Means Clustering

And the final step.... reassign points.

Reassign Points

Page 47: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Variations of K-Means:

•K-Median Clustering (uses median to

find new cluster positions instead of

mean).

Use median/mean/etc/ to reposition cluster.

Page 48: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Related to K-Means Clustering:

•Mixture of Gaussians (now clusters have a width as well) - Gaussian Probability Distribution instead of a metric. Other differences too.

2

2

2/2 2

)(exp

)2(

1)cluster_k|(

k

kd

k

XXp

Soft Partition (vs. Hard)

Page 49: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Comments on K-Means Comments on K-Means ClusteringClustering

• May not converge nicely, need May not converge nicely, need multiple random restarts.multiple random restarts.

• Results are straightforward (unlike Results are straightforward (unlike hierarchical clustering).hierarchical clustering).

• Still a Still a relatively simple relatively simple tool - not tool - not much mathematical modelling.much mathematical modelling.

Page 50: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Comments on K-Means Comments on K-Means ClusteringClustering

• Earlier: showed random Earlier: showed random initializationinitialization..

• Can run hierarchical clustering to Can run hierarchical clustering to initializeinitialize K-Means Clustering algorithm. K-Means Clustering algorithm.

• This can help with convergence This can help with convergence problems as well speed up the problems as well speed up the algorithm.algorithm.

Page 51: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• ClusteringClustering

– Hierarchical ClusteringHierarchical Clustering

– K-means ClusteringK-means Clustering

– Stability of ClustersStability of Clusters

• Principal Components AnalysisPrincipal Components Analysis

• Self-Organizing MapsSelf-Organizing Maps

Page 52: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Stability of ClustersStability of Clusters

• Ideally, want Ideally, want differentdifferent (good) (good) clustering techniques to provide clustering techniques to provide similiarsimiliar results. results.

• Otherwise the clustering is likely Otherwise the clustering is likely arbitrary, not modelling any arbitrary, not modelling any true true structure structure in the data set.in the data set.

Page 53: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• Clustering (Thoroughly)Clustering (Thoroughly)

• Principal Components Analysis Principal Components Analysis (Briefly)(Briefly)

• Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

Page 54: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Principal Components Principal Components AnalysisAnalysis(PCA)(PCA)

• Mathematical technique to Mathematical technique to reduce reduce the dimensionality of the datathe dimensionality of the data..

Page 55: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

PCA - Dimension PCA - Dimension ReductionReduction

2 Projections Shown

•Some projections are more informative than others.

•While projections differ, the object remains unchanged in original space.

3D2D (two ways)

Page 56: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Why?Why?• Instead of clustering 20,000 Instead of clustering 20,000

dimensional data - cluster 100 dimensional data - cluster 100 dimensional data.dimensional data.

• Typically some dimensions are Typically some dimensions are redundant.redundant.

• Might eliminate noise, get more Might eliminate noise, get more meaningful clusters (?)meaningful clusters (?)

Page 57: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Why?Why?• Instead of clustering 20,000 dimensional Instead of clustering 20,000 dimensional

data - cluster 100 dimensional data.data - cluster 100 dimensional data.

• Typically some dimensions are Typically some dimensions are redundant.redundant.

• Might eliminate noise, get more Might eliminate noise, get more meaningful clusters (?)meaningful clusters (?)

Fewer dimension means easier to Fewer dimension means easier to initialize ML algorithms and get good initialize ML algorithms and get good results.results.

Page 58: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Simple 2D ExampleSimple 2D Example

X

Y

One could cluster these 2D points in 2 dimensions.

Page 59: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Simple 2D ExampleSimple 2D Example

X

Y

One could cluster these 2D points in 2 dimensions.

But... what if...

Page 60: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Simple 2D ExampleSimple 2D Example

X

We squashed them into 1D.

‘squash’geometric projection

Page 61: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Simple 2D ExampleSimple 2D Example

X

Y

X

•Y-Dimension was redundant.

•Only needed X variable to cluster nicely.

Page 62: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Another 2D ExampleAnother 2D Example

X

Y

It is not obvious which dimension to keep now.

Page 63: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Another 2D ExampleAnother 2D Example

X

Y

X

Y

There is no axis onto which we can project to get good separation of the data...

Page 64: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Another 2D ExampleAnother 2D Example

X

Y

But if we project the data onto a combination (linear combination) of the two dimensions... it works out nicely.

X

Y

Page 65: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

That was the Intuition Behind That was the Intuition Behind PCAPCAOutline of PCA:Outline of PCA:

• Step 1 - Find the direction that Step 1 - Find the direction that accounts for the largest amount of accounts for the largest amount of variation in the data set, call this variation in the data set, call this EE11..

X

Y

E1

First Principal Component

Page 66: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

That was the Intuition Behind That was the Intuition Behind PCAPCAOutline of PCA:Outline of PCA:

• Step 1 - Find the direction that Step 1 - Find the direction that accounts for the largest amount of accounts for the largest amount of variation in the data set, call this variation in the data set, call this EE11..

• Step 2 - Find the direction which is Step 2 - Find the direction which is perpendicular perpendicular ((orthogonal/uncorrelated)orthogonal/uncorrelated) to to EE11 and and accounts for the next largest amount accounts for the next largest amount of variation in the data set, call this of variation in the data set, call this EE22 . .

Page 67: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

That was the Intuition Behind That was the Intuition Behind PCAPCAOutline of PCA:Outline of PCA:

• Step 3 - Find 3Step 3 - Find 3rdrd next best direction next best direction which is orthogonal to the other 2 which is orthogonal to the other 2 directions - call this directions - call this EE3.3.

•• • • • •

• Step N - Find the NStep N - Find the Nthth such direction. such direction. (If there were N dimensions to begin (If there were N dimensions to begin with).with).

Page 68: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

PCA - Some Linear PCA - Some Linear Algebra...Algebra...

T

NNNN

N

NNNNN

N

NNNN

N

vvv

vvv

uuu

uuu

ccc

ccc

...

............

............

...

000

............

000

000

...

............

............

...

...

............

............

...

21

11211

2

1

21

11211

21

11211

VUC

Covariance Matrix of Original Data

Principal Components

Variance in each Component

Singular Value Decomposition (SVD)

2nd Principal Component

Page 69: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Principal Components Principal Components AnalysisAnalysis

• Typical dimensional reduction might Typical dimensional reduction might be:be:

1 million 1 million 200,000, 200,000,

which might retain 95% of the original which might retain 95% of the original information.information.

• Reduction depends on set of data.Reduction depends on set of data.

Page 70: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

PCA - Useful for PCA - Useful for clustering?clustering?• It turns out that PCA can lead to It turns out that PCA can lead to

worseworse clustering than simply using clustering than simply using the original data (not necessarily).the original data (not necessarily).

• PCA is often used in conjunction PCA is often used in conjunction with other techniques, such as with other techniques, such as Artificial Neural Networks, Support Artificial Neural Networks, Support Vector Machines, Vector Machines, etc.etc.

Page 71: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

PCA - Interesting Side PCA - Interesting Side NoteNote• PCA is the basis PCA is the basis

for most of the for most of the Face Face RecognitionRecognition systems currently systems currently in use.in use.

• egeg. Security in . Security in airports airports etc.etc. Principal ‘Face’

Directions

Page 72: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

OutlineOutline

• Introduction to Machine LearningIntroduction to Machine Learning

• Clustering (Thoroughly)Clustering (Thoroughly)

• Principal Components Analysis (Briefly)Principal Components Analysis (Briefly)

• Self-Organizing Maps (Briefly)Self-Organizing Maps (Briefly)

Page 73: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing Maps Self-Organizing Maps (SOMs)(SOMs)

• Way to visualize high-dimensional Way to visualize high-dimensional data in a low-dimensional space.data in a low-dimensional space.

• Commonly view data in 1 or 2 Commonly view data in 1 or 2 dimensions with SOMs.dimensions with SOMs.

Page 74: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing Maps Self-Organizing Maps (SOMs)(SOMs)

Can think of SOMs as a Can think of SOMs as a crosscross between between K-Means ClusteringK-Means Clustering and and PCAPCA::

Page 75: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing Maps Self-Organizing Maps (SOMs)(SOMs)Can think of SOMs as a Can think of SOMs as a crosscross between between K-K-

Means ClusteringMeans Clustering and and PCAPCA::

K-MeansK-Means: find Cluster centres.: find Cluster centres.

PCAPCA: reduce dimensionality of the cluster : reduce dimensionality of the cluster centres. (centres. (i.e. i.e. impose a ‘structure’ on the impose a ‘structure’ on the relationship between cluster centres)relationship between cluster centres)

(At the same time!)(At the same time!)

Page 76: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing MapsSelf-Organizing Maps

Example: Impose a 1D structure on the cluster centres.

1 Dimension

5000 Dimensions

Page 77: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing MapsSelf-Organizing Maps•This imposes an ordering on the cluster centres.

1 Dimension

5000 Dimensions

32 41

1

24

3

Page 78: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing MapsSelf-Organizing Maps•This imposes an ordering on the data points.

•Data points from Cluster 1 come before points in Cluster 2, etc.

•Then order based on proximity to neighbouring clusters.

1 Dimension

5000 Dimensions

32 41

5

3

21

987

64

13

12

11

10

17

15

16

14

1918

Page 79: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing MapsSelf-Organizing MapsWhat is important to know for our immediate interest:

SOM imposes a unique ordering on the data points.

1 Dimension

5000 Dimensions

32 41

5

3

21

987

64

13

12

11

10

17

15

16

14

1918

Page 80: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

SOMs and Hierarchical SOMs and Hierarchical ClusteringClustering

G D F E B C A

Recall the problem of arbitrariness in the order of the branches in Hierarchical Clustering?

Can use SOMs to help.

Hierarchical Clustering

(dendogram)

Page 81: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

SOMs, Eisen, ClusterSOMs, Eisen, Cluster

G D F E B C A

Eisen’s Cluster can use the ordering from the SOM, to do hierarchical clustering.

1) Run 1D SOM on data set.

2) Build dendogram using ordering from SOM.

Hierarchical Clustering

(dendogram)

5

3

2 1

987

64

12

11

10

1715

16

14

1918

13

Page 82: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Self-Organizing MapsSelf-Organizing Maps

• Not normally what SOMs are used Not normally what SOMs are used for (for (i.e.i.e. hierarchical clustering) hierarchical clustering)

• Mainly used for visualization, and as Mainly used for visualization, and as a first step before further analysis.a first step before further analysis.

• Can also use 2D, 3D SOMs.Can also use 2D, 3D SOMs.

Page 83: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Concluding RemarksConcluding Remarks

I hope that I managed to:I hope that I managed to:

1) Give some idea of what machine 1) Give some idea of what machine learning is about (‘structure’ in high-learning is about (‘structure’ in high-dimensional spaces).dimensional spaces).

2) The intuition behind several 2) The intuition behind several techniques, and familiarity with techniques, and familiarity with their names.their names.

Page 84: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)
Page 85: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Main Ideas of PCAMain Ideas of PCA• For N dimensional data, find N directions.For N dimensional data, find N directions.

• Can represent any one of our original data points Can represent any one of our original data points using (using (a linear combination)a linear combination) of these N directions: of these N directions:

i.e.i.e.

NN EaEaEaEa ......Patient_X 332211

Nth direction

),...,,,(Patient_X 321 Naaaa

Page 86: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Main Ideas of PCAMain Ideas of PCA• Key IdeaKey Idea: Can represent : Can represent

extremely well any one of our extremely well any one of our original data points using fewer original data points using fewer than N directions:than N directions:

dNdN EaEaEaEa ......Patient_X 332211

),...,,,(Patient_X 321 dNaaaa

Fewer than N directions.

Page 87: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Another 2D ExampleAnother 2D Example

X

Y

But if we project the data onto a combination (linear combination) of the two dimensions... it works out nicely.

X

Y

Page 88: Introduction to Machine Learning for Microarray Analysis Jennifer Listgarten October 2001 (for the non-computer scientist)

Concluding RemarksConcluding Remarks

Understand the algorithms behind Understand the algorithms behind your analysis. Blind application can your analysis. Blind application can lead to erroneous conclusions.lead to erroneous conclusions.