lisa short course series multivariate clustering analysis in r yuhyun song nov 03, 2015 lisa:...

43
LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Upload: felicity-merritt

Post on 17-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

LISA Short Course SeriesMultivariate Clustering Analysis in R

Yuhyun SongNov 03, 2015

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 2: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Laboratory for Interdisciplinary Statistical

Analysis

Collaboration:

Visit our website to request personalized statistical advice and assistance with:

Designing Experiments • Analyzing Data • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, Minitab...)

LISA statistical collaborators aim to explain concepts in ways useful for your research.

Great advice right now: Meet with LISA before collecting your data.

All services are FREE for VT researchers. We assist with research—not class projects or homework.

LISA helps VT researchers benefit from the use of Statistics

www.lisa.stat.vt.edu

LISA also offers:

Educational Short Courses: Designed to help graduate students apply statistics in their researchWalk-In Consulting: Available Monday-Friday from 1-3 PM in the Old Security Building (OSB), Tuesday, Thursday, and Friday from 10-12pm in the GLC, and Wednesday from 10 am-12 pm in Hutcheson for questions <30 mins.

Page 3: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 4: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

Model based clustering algorithm

5. Cluster ValidationLISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 5: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

DATA: Tweeter data

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• Can be downloaded from the website, http://www.rdatamining.com• Contains 320 tweets by @RDataMining

Page 6: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

DATA: Tweeter data

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• Text data needs several procedures for data munging since text data is categorical.- Transforming text

Changing letters to lower caseRemoving punctuations, numbers, stop words.

- Stemming words- Building a term document matrix containing word

frequencies.

We will implement above procedures before we apply clustering algorithms into a data matrix!

Page 7: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 8: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

• Univariate Data Analysis– used when one outcome variable is measured for each

object.

• Multivariate Data Analysis– used when more than one outcome variables are measured

for each object.– refers any statistical technique used to analyze data that

arises from more than one variable.– concerned with the study of association among sets of

measurements.

Multivariate Data Analysis

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 9: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Multivariate Data Analysis

LISA: Multivariate Clustering Analysis in R

Method Objectives Exploratory vs. Confirmatory

Principal Components Analysis

Dimension Reduction Exploratory

Factor Analysis Understand patterns of intercorrelation

Both

Multidimensional Scaling Analysis

Create spatial representation from objects

similarities

Mainly Exploratory

Classification Analysis Build a classification rules for predefined groups

Both

Clustering Analysis Create groupings from objects similarities

Exploratory

Nov 3, 2015

Page 10: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 11: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Clustering Analysis

LISA: Multivariate Clustering Analysis in R

•What is a natural grouping among characters?

•Segmenting characters into groups is subjective.

Villains Heroes

Nov 3, 2015

Males Females

Page 12: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Maximize inter-cluster distances

Minimize intra-cluster distances

• Cluster: a collection of data objects– Objects are similar to one another

within the same cluster.– Objects are dissimilar to the objects

in other clusters.

• Cluster analysis– Finding similarities between data

according to the characteristics found in the data and grouping a set of data objects in such a way that objects in the same group

• Unsupervised learning: no predefined classes

Clustering Analysis

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 13: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Two Types of Clustering Analysis

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Hierarchical Partitional

•Hierarchical Clustering: Objects are partitioned into nested groups that are organized as a hierarchical tree.•Partitioning Clustering: Objects are partitioned into non-overlapping groups and each object belongs to one group only.

Page 14: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Data Structure

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• Data matrix n x p matrix, where n is the number of data objects

and p is the number of variables most suitable for partitioning methods

• Similarity/dissimilarity (distance) matrix n × n matrix calculated from the data matrix most suitable for hierarchical agglomerative methods

Page 15: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Dissimilarity (Distance) Measures

•A distance measure is the numerical measure that indicates how different two objects are; the lower its value the more similar the objects are.•Given two data objects X1 and X2, the distance between X1 and X2 is

a real number denoted by d(X1,X2).

•Common distance measures between data objects:

• Euclidean Distance:

• Manhattan Distance:

• Minkowski Distance:

)||...|||(|),( 22

22

2

11 pp jx

ix

jx

ix

jx

ixjid

||...||||),(2211 pp jxixjxixjxixjid

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

qqjpip

qji

qji xxxxxxjid /1

2211 ))(...)()((),(

Page 16: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 17: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• Hierarchical Agglomerative Clustering produces a sequence of solutions (nested clusters), and is organized in a hierarchical tree structure.

• Use a distance matrix for clustering and the solution is visualized by a dendrogram.

• This method does not require the number of clusters k as an input.

Page 18: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• Distance between clusters:Single linkage: smallest distance

between an object in one cluster and

an object in the other, i.e., d(Ci, Cj) =

min(Xip, Xjq)

Complete linkage: largest distance

between an object in one cluster and

an object in the other, i.e., d(Ci, Cj) = =

max(Xip, Xjq)

Average linkage: avg distance

between an object in one cluster and

an object in the other, i.e., d(Ci, Cj) = =

avg(Xip, Xjq)

single link (min)

complete link (max)

average

Page 19: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

•Given a data set of n data objects, Hierarchical Agglomerative Clustering algorithm is implemented in following steps:

Step 1. Calculate the distance matrix for n data objectsStep 2. Set each object as a clusterStep 3. Repeat until the number of cluster is 1

Step 3.1. Merge two closest clustersStep 3.2. Update the distance matrix by linkage

functions

Page 20: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Example: Given 5 data objects,

A

D

B

E

C

02.39.37.54.6

2.3016.220.3

9.3102.369.2

7.56.206.2071.0

4.620.369.271.00

E

D

C

B

A

EDCBA

Distance Matrix

Page 21: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

A

D

B

E

C

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Update the distance matrix by using Single Linkage function.

02.39.37.5

2.3016.2

9.31006.2

7.56.206.20),(

),(

E

D

C

BA

EDCBA

02.39.37.54.6

2.3016.220.3

9.3102.369.2

7.56.206.2071.0

4.620.369.271.00

E

D

C

B

A

EDCBA

7.5)7.5,4.6min()),(),,(min()),,((

6.2)6.2,20.3min()),(),,(min()),,((

06.2)06.2,69.2min()),(),,(min()),,((

EBdEAdEBAd

DBdDAdDBAd

CBdCAdCBAd

Page 22: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

A

D

B

E

C

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Update the distance matrix by using Single Linkage function.

02.37.5

2.3006.2),(

7.506.20),(

),(),(

E

DC

BA

EDCBA

2.3)2.3,9.3min()),(),,(min()),,((

06.2)6.2,06.2min())),(,()),,(,(min()),(),,((

EDdECdEDCd

BADdBACdBADCd

02.39.37.5

2.3016.2

9.31006.2

7.56.206.20),(

),(

E

D

C

BA

EDCBA

Page 23: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

1. In the beginning we have 5 clusters.2. We merge clusters A and B into cluster (A, B) at distance 0.71 3. We merge cluster C and cluster

D into (C, D) at distance 1 4. We merge clusters (A,B) and

(C, D) into ((A, B), (C, D)) at distance

2.06.5. We merge clusters ((A, B), (C,

D)) and E at distance 3.2. 6. The last cluster contain all the

objects, thus conclude the computation

A B C D E

0.71

1

2.06

3.2

Dist.

Dendrogram

Page 24: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015 A B C D E

0.71

1

2.06

3.2

Dist.

K=3

K=2

K=4

•How do we decide the number of clusters? Cut the tree.

Page 25: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

R: Hierarchical Agglomerative Clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• Let’s build a data matrix of the word frequencies that enumerates the number of times that each word occurs in each tweet (document) in R. Then, we will cluster words in tweets by a Hierarchical Agglomerative Clustering algorithm.

Page 26: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 27: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Partitioning Algorithm

•Partitioning method: Construct a partition of n data objects into a set of K clusters. Given a pre-determined K, find a partition of K clusters that optimizes the chosen partitioning criterion.

k-means (MacQueen’67): Each cluster is represented by the center of the cluster.

PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each cluster is represented by one of the data objects in the cluster.

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 28: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 29: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

K-means clustering

• Given a set of observations, K-means clustering aims to partition n observations into K clusters by minimizing the within-cluster sum of squares (WCSS), where

• Each cluster is associated with a centroid.– Each point is assigned to the cluster with the closest centroid.– Initial K centroids are chosen randomly. – The centroid is the mean of the points in the cluster.

• Number of clusters, K must be specified

k

i Sxi

S i

x1

2minarg

Nov 3, 2015

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 30: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

• Given K, K-means algorithm is implemented in four steps:Step 1. Partition objects into K nonempty subsets

Step 2. Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)

Step 3. Assign each object to the cluster with the nearest seed point

Step 4. Go back to Step 2, stop when no more new assignment

K-means clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 31: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

K-means clustering

• How to determine the number of clusters in K-means clustering?– Fit K-means clustering with

different K’s and calculate WSS.

– Draw a scree plot– Choose the number of clusters

where there is sharp drop with respect to WSS

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 32: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Reference:"Kmeans animation withoutWatermark" by Incheol - Licensed under CC BY-SA 4.0 via https://commons.wikimedia.org/wiki/File:Kmeans_animation_withoutWatermark.gif#/media/File:Kmeans_animation_withoutWatermark.gif

Clustering Analysis: K-means clustering

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 33: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 34: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

• The PAM algorithm partitions the n objects into K clusters by specifying the clustering solution which minimizes the overall dissimilarity between the represents of each cluster and its members.

• Each cluster is associated with a medoid.– Each point is assigned to the cluster with the closest

medoid.– K medoids are K representative data objects.

Partitioning Around Medoids(PAM)

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 35: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

• In PAM, Swapping Cost is used as a objective function:– For each pair of a medoid m and a non-medoid object h,

measure whether h is better than m as a medoid– Use the squared-error criterion

– Compute Eh-Em

– Negative: swapping brings benefit• Choose the minimum swapping cost

Partitioning Around Medoids(PAM)

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

k

i Cpi

i

mpdE1

2),(

Page 36: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Partitioning Around Medoids(PAM)

• Given K, PAM is implemented in 6 steps:Step 1. Randomly pick K data points as initial medoids

Step 2. Assign each data point to the nearest medoid x

Step 3. Calculate the objective function the sum of dissimilarities of all points to their nearest medoids. (squared-error criterion)

Step 4. Randomly select an point y

Step 5. Swap x by y if the swap reduces the objective function

Step 6. Repeat step 3-step 6 until no change

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Page 37: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

1. Data2. What is multivariate analysis?3. What is clustering analysis? 4. Clustering algorithms

Hierarchical agglomerative clustering algorithm Partitioning clustering algorithms

K-means clustering Partitioning Around Medoids (PAM)

5. Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

OUTLINE

Page 38: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Cluster Validation

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

– Why is cluster validation necessary? • Clustering algorithms will define clusters even if there are no natural

cluster structure. In higher dimensions, it is not easy to detect whether there are natural cluster structures. Thus, we need approaches to determine whether there is non-random structure in the data and how well the results of a cluster fit the data.

Page 39: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

The Silhouette Coefficient

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• The Silhouette Coefficient: a method of interpretation and validation of consistency within clusters of data. It quantifies the quality of clustering .

• How to compute the SC? • For an individual point, i

– Calculate a = avg. distance of i to the points in its cluster– Calculate b = min (avg. distance of i to points in another cluster)– The silhouette coefficient for a point is then given by

s = 1 – a/b if a < b, (or s = b/a - 1 if a b, not the usual case)

• Can calculate the Average Silhouette Coefficient for a cluster or a clustering. The closer to 1 the better.

Page 40: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

The Silhouette Coefficient

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Range of avg. SC

Interpretation

0.71-1.0 A strong structure has been found

0.51-0.70 A reasonable structure has been found

0.26-0.50 The structure is weak and could be artificial. Try additional methods of data analysis.

<0.26 No substantial structure has been found

Page 41: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

R: K-means clustering and PAM

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

• We will cluster tweets by K-means clustering and PAM.

• Then, we will visualize the silhouette plot to see the quality of clustering solutions.

Page 42: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

Reference

LISA: Multivariate Clustering Analysis in R Nov 03, 2015

• RDataMining: http://www.rdatamining.com

• Pang-Ning, Tan, Michael Steinbach, and Vipin Kumar. "Introduction to data mining." Library of Congress. 2006.

• Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. Springer, Berlin: Springer series in statistics, 2001.

Page 43: LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015

LISA: Multivariate Clustering Analysis in R Nov 3, 2015

Please don’t forget to fill the sign in sheet and to complete the survey that will be sent to you by email.

Thank you!