part i. how to perform a cluster analysis on directional data

24
Part I. How to Perform A Cluster Analysis on Directional Data Phillip Hendrickson

Upload: hedia

Post on 13-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Part I. How to Perform A Cluster Analysis on Directional Data. Phillip Hendrickson. Caveat. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part I. How to Perform A Cluster Analysis on Directional Data

Part I.How to Perform A Cluster Analysis on Directional Data

Phillip Hendrickson

Page 2: Part I. How to Perform A Cluster Analysis on Directional Data

Caveat

This presentation aims at explaining how a cluster analysis would work on directional data in the first part. Part II reviews my actual project which is regrettably incomplete at this time due to lack of funding for necessary datasets. In lieu of this shortcoming, I will briefly describe how I would further my project along by incorporating directional data.

Page 3: Part I. How to Perform A Cluster Analysis on Directional Data

Introduction

Cluster analysis in the spatial realm has many applications, but the general principal behind any statistical analysis in the spatial world relates to the first law of geography where according to Waldo Tobler, "Everything is related to everything else, but near things are more related than distant things."

Page 4: Part I. How to Perform A Cluster Analysis on Directional Data

Introduction (con’t)

Knowing the first law of Geography, it is difficult to disagree that cluster analysis is a frequently used exploratory procedure in the geophysical world with arguably one of its biggest applications seen in climatology.

Page 5: Part I. How to Perform A Cluster Analysis on Directional Data

Recap of Cluster Analysis

In class, we touched on how to perform a cluster analysis in which we look at the observations and perform a sequence of steps to see which observations look alike.

Remember:1. Develop a classification

2. Investigate grouping schemes

3. Data exploration

4. Hypothesis Testing

Page 6: Part I. How to Perform A Cluster Analysis on Directional Data

Steps involved in Cluster Analysis

1. Select a sample which reflects the population

2. Select variables (what is important, what should you measure, what can you measure)

3. Create a similarity matrix (PMCC or covariance) that has symmetry, equality, distinguishability, and triangle inequality.

4. Select grouping method (hierarchical agglomerative or k-means)

Page 7: Part I. How to Perform A Cluster Analysis on Directional Data

The Problem with Spatial Data (in some cases)

The steps to conduct a spatial clustering are no different expect that you will automatically have two variables (one for your latitude and one for your longitude).

So what is being introduced here that is different?

How do you analyze directional data?

Page 8: Part I. How to Perform A Cluster Analysis on Directional Data

Directional Data

In many situations a study may require the analysis of directional data (also referred to as aspect). For example, let’s say you want to track a storm event. Already you know you will need to handle wind vectors. Or perhaps you’re interested in geological events such as landslides or earthquakes, chances are good that you will need to process slope angles.

Page 9: Part I. How to Perform A Cluster Analysis on Directional Data

The Problem With Directional Data

Statistics for linear data will not work for directional data because there is no accounting for circular nature (1º and 359º are only 2º apart)

If you tried to force linearity you would essentially cut the circle in half which would negate the observations you cut out and prevent any proper clustering.

Page 10: Part I. How to Perform A Cluster Analysis on Directional Data

Exploit Periodicity

Lund 1999 suggests a method for univariate directional clustering that uses the circular nature of directional data to find an optimal number of clusters from the dataset.

Page 11: Part I. How to Perform A Cluster Analysis on Directional Data

Location and Dipersion

A measure given by the first trigonometric moment

Asks if there exists a positive finite Borel measure on the unit circle whose first n trigonometric moments take some specified values

Page 12: Part I. How to Perform A Cluster Analysis on Directional Data

Parameters

μ - mean direction Measures the location of the distribution

p - mean resultant length of ϴ Measures the dispersion also referred to as r bar Values close to 0 indicate large dispersion and

values close to 1 signify a high degree of concentration in the data.

Page 13: Part I. How to Perform A Cluster Analysis on Directional Data

How to Identify Clusters

We use the measure of dispersion Clusters are made with reference to the

largest arc lengths between observations

Lund 1999 FIG. 1

Mean resultant length of the entire dataset would be 0.05 , but separated into two clusters, the mean resultant lengths are 0.88 and 0.87

Page 14: Part I. How to Perform A Cluster Analysis on Directional Data

How To Measure Significance of Clusters

k clusters are determined by the k largest spaces denoted . To calculate the significance of the clusters:

Large values of represent a high level of clustering between clusters.

Page 15: Part I. How to Perform A Cluster Analysis on Directional Data

The Optimal Amount of Clusters

This will be the number of k clusters which maximizes the significance.

Values of significance can be negative, but occur only when data is evenly distributed.

Page 16: Part I. How to Perform A Cluster Analysis on Directional Data

Part II.Spatial Clustering of Ice Storm Vulnerability in New England

Page 17: Part I. How to Perform A Cluster Analysis on Directional Data

Abstract

The New England region of the United States is periodically hit by severe winter weather which endangers county residents, damages transportation infrastructure, and devastates local economies.

This project implements the use of cluster analysis and GIS to perform an exploratory investigation in order to understand which counties within the New England states are under the most strain from these potentially disastrous events.

The scope of this project looks at all recorded icing and glazing events from 1993 to 2009 in the New England region and forces regional clustering through spatial data.

Page 18: Part I. How to Perform A Cluster Analysis on Directional Data

Data

My cluster analysis uses data from NOAA covering a 16 year period which lists the counties by state that were hit by each ice storm or glazing event from 1993- 2009.

US census data was provided by John Mackenzie from the FREC Department.

Page 19: Part I. How to Perform A Cluster Analysis on Directional Data

Methods

Extensive data mining was performed using Microsoft Excel

Storm events were matched and joined to counties by FIPS code using VLOOKUP

A pivot table was constructed to get total ice storm events per county in the New England region

Using JMP a cluster analysis was performed Variables included:

Population age 65 and up Population density Poverty Number of households Number of events in 16 year period Decimal degrees for longitude and latitude of county

centroids

Page 20: Part I. How to Perform A Cluster Analysis on Directional Data

Methods (con’t)

Using ArcMap, a Hot Spot analysis was performed and weighted by the clusters calculated from JMP.

Page 21: Part I. How to Perform A Cluster Analysis on Directional Data

Results

Cluster analysis in JMP returned an optimal amount of 10 clusters using the “Elbow Method” introduced by Robert Thorndike in 1953

Page 22: Part I. How to Perform A Cluster Analysis on Directional Data

Results (con’t)

Proposed vulnerability ranking regarding frequency of events:

5

10

3

6

9

7

1

2

4

8

Page 23: Part I. How to Perform A Cluster Analysis on Directional Data

Further Analysis

Incorporate TIGER and DEM data into this analysis to find the most vulnerable counties with regard to paved road surfaces (a large factor that should dictate an areas vulnerability)

Use DEM data to get aspect from the topology. With more time, use directional clustering mentioned in part I to discover which elevations and slope faces are more frequently affected by ice storm events.

Page 24: Part I. How to Perform A Cluster Analysis on Directional Data

Conclusions

In hindsight I would have selected fewer clusters for my analysis, but in terms of an exploratory project, I would say this cluster analysis met with success. In the future it may be interesting to perform this same analysis for other regions of the CONUS. With more time and reliable datasets, I would also like to perform a multivariate directional cluster analysis by incorporating wind data and slope aspect into this project.