jun yan geography department suny at buffalo july 29, 2004 geographic knowledge discovery in spatial...

57
Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Geographic Knowledge Discovery in Spatial Interaction With Self- Spatial Interaction With Self- Organizing Maps Organizing Maps Ph.D. Dissertation Defense Ph.D. Dissertation Defense Dissertation Committee: Dr. Jean-Claude Thill (Chair) Dr. Ling Bian Dr. David Mark

Upload: allen-cooper

Post on 11-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Jun YanGeography DepartmentSUNY at BuffaloJuly 29, 2004

Geographic Knowledge Discovery in Geographic Knowledge Discovery in

Spatial Interaction With Self-Spatial Interaction With Self-

Organizing MapsOrganizing Maps

Ph.D. Dissertation Defense Ph.D. Dissertation Defense

Dissertation Committee:Dr. Jean-Claude Thill (Chair)Dr. Ling BianDr. David Mark

Page 2: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background

Spatial Interaction Data

Methodology Self-Organizing Maps

Visual Data Mining

Case studies

Conclusions and Future Research

OutlineOutline

Page 3: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

BackgroundBackground

Information technologies

More tools available

More data available

Two Legs!!!

Data-rich vs computation-rich:

challenge?

opportunity !!!

Page 4: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Data Mining & Knowledge Discovery: “useful information from large databases”

useful novel valid Understandable

Geographic data mining (GDM) and geographic knowledge discovery (GKD)?

Page 5: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Mining techniques: statistics, pattern recognition, machine learning, visualization, high performance computing …

Knowledge discovery processUser Controller

DBMSDB

InterfaceTarget Data

Selection

Data Mining Evaluation Discoveries

DomainKnowledge Knowledge Base

Knowledge discovery process

Data Mining

Page 6: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Finding all the patterns autonomously in a database?: unrealistic

because the patterns could be too many but uninteresting

Data mining: an iterative, interactive, semi-automated process

people directs what to be mined

Visualization: Geovisualization (GVis)

visual data mining !!!

Page 7: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Visualization in KDD ProcessVisualization in KDD Process

Selecting Application Domain

Selecting Target Data

Processing Data

Extracting Information/Knowledge

Interpretation and Evaluation

Understanding basic data distribution, selecting meaningful target datasets

Locating missing data, noise removing, data smoothing

Parameters setting, process tracking, process steering

Interpretation, reporting, comparison, validity checking

Page 8: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Learning Algorithm

Examples

Background knowledge (sometimes)

Concept description or

Other knowledge

Input layer Output layer

Hidden layer

Inputs Outputs

Machine learning & Neural Networks

Page 9: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Objectives: Explore the effectiveness of neural

networks in GKD

Examine the roles of GVis in GKD

Page 10: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

What is spatial interaction? Pairs of places

Elemental: trips made by individuals

Aggregate: flows from origins to destinations

Examples: migration, freight shipment, movement of capital & information …

Spatial Interaction DataSpatial Interaction Data

Page 11: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Spatial Interaction Data Spatial Interaction Data (Cont.)(Cont.)

  Region 1 Region 2 Region 3

Region1      

Region 2      

Region 3      

Basic O-D matrix

  Type 1 Type 2 Type 3

Region1>Region 1      

Region1>Region 2      

Region1>Region 3      

Dyadic O-D matrix

  Origin Destination

Distance

Trip 1      

Trip 2      

Trip 3      

Trip table

Elemental level

Aggregate level

Page 12: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Exploring the Patterns of Interaction

Very necessary!!!

Existing Exploratory Data Analysis (EDA): lack of interactivity

Challenges:

a large number of interactions

wide range of interaction magnitudes

multiple semantics

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Page 13: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Origin

Destination

Interaction semantics

O-D Matrices

Multidimensionality!!!

Page 14: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Electronic products

Machinery

Vehicle and parts Photographic products

Page 15: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

MethodologyMethodology

Self-Organizing Maps (SOM)

Visual Data Mining (VDM):

SOM as core DM engine

Interactivity

Page 16: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Self-Organizing MapsSelf-Organizing Maps

A crucial task of KDD: reduce data complexity

1) Data Quantization: number of records, here number of spatial interactions

2) Data Projection: number of variables, here number of interaction semantics

By reducing data complexity, identification of meaningful geographic structures becomes possible

Traditional multivariate statistical methods share their limitations

Page 17: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Self-Organizing Maps (Cont.)(Cont.)

Losing Node

Winning NodeOutput

Losing Node

Input Layer Competitive Output layer

1. A special type of competitive neural network;

2. Based on some measure of dissimilarity in the attribute space;

3. Capable of reducing data complexity on two dimensions simultaneously

4. Actually an unsupervised pattern classifier.

1. A special type of competitive neural network;

2. Based on some measure of dissimilarity in the attribute space;

3. Capable of reducing data complexity on two dimensions simultaneously

4. Actually an unsupervised pattern classifier.

Page 18: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

))()(()()()1( tmxthttmtm kckkk

Self-Organizing Maps (Cont.)(Cont.)

1. Best match unit (BMU) changes its value to fit with the input data;

2. Its neighboring nodes change their values to fit with the input data as well. Only the magnitude decreases with distance;

3. Like a flexible net;

4. Similar data will locate close to each other in the mapping

1. Best match unit (BMU) changes its value to fit with the input data;

2. Its neighboring nodes change their values to fit with the input data as well. Only the magnitude decreases with distance;

3. Like a flexible net;

4. Similar data will locate close to each other in the mapping

Page 19: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Visual Data MiningVisual Data Mining

Visualization Forms

Assignment

Focusing

Brushing

Colormap manipulation

Dynamic linking

Interaction Forms

Operation

Framework

Page 20: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Visualization FormsVisualization Forms

Page 21: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Case StudiesCase Studies

Airline Origin and Destination Survey Market Table (DB1Market): http://www.bts.org 10% of air flight itineraries

Geographic scale: airport level 280 metros in Contiguous US

Temporal range: 1993 to 2002

Two case studies on DB1BMarket Cross-sectional analysis

Temporal changes

Page 22: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

9

8

7

6

5

4 3

21

Clustering AnalysisClustering Analysis

1. A cluster is an area of low values (distance) surrounded by areas of high values (distance).

2. There are several clusters in the feature map

1. A cluster is an area of low values (distance) surrounded by areas of high values (distance).

2. There are several clusters in the feature map

9-1

8

7

6

5

43

2

1

9-2

9-3 9-4

9-5

Page 23: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Clustering Analysis (Cont.)Clustering Analysis (Cont.)

A cluster is a valley in a 3-D mapA cluster is a valley in a 3-D map

Page 24: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

Market Share

Contribution

Page 25: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

C #

Cluster Property (Airline)

1 America West (HP)

2 US Air (US)

3 Continental (CO), Continental Express

(RU)

4 Northwest (NW), Mesaba (XJ)

5 Horizon (QX)

6 United (UA)

7 Air Wisconsin (ZW)

8 American (AA), American Eagle

(MQ)

9-1

No dominant airlines

9-2

Southwest (WN)

9-3

Comair (OH)

9-4

Delta (DL)

9-5

Delta (DL), Atlantic Southeast (EV)

Multiple

AA MQ

ZW

UA

QX

NW XJ

CO RU

US

HP

WN

QX DL

DL

EV

Page 26: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

Markets with US Airways Market Share >= 50%

Markets Represented by Cluster 2

Cluster 2

Page 27: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Cluster Analysis: Cluster Analysis: MarketsMarkets From From NashvilleNashville

AA

US

NW

UADL

CORU

WN

EV

Page 28: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Cluster Analysis: Cluster Analysis: MarketsMarkets From From Nashville (Cont.)Nashville (Cont.)

AA

US

NW

UADL

CORU

WN

EV

Page 29: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Association AnalysisAssociation Analysis

Market Share

Average

Airfare

Page 30: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Association Analysis Association Analysis (Cont.)(Cont.)

American Delta

Page 31: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Association Analysis Association Analysis (Cont.)(Cont.)

Average Airfare, Delta (without competition of Airtran)

Average Airfare, Delta (with competition of Airtran)

Page 32: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal ChangesTemporal Changes

Page 33: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Changes (Cont.)Temporal Changes (Cont.)

AA 1993

TWA 2001

AA 2001AA

2002

Page 34: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Changes (Cont.)Temporal Changes (Cont.)

Continental share

Northwest share

Page 35: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Changes: Temporal Changes: TrajectoryTrajectory

98

00

96

01

93

US Airways share

98

00

96

01

93

Southwest share

98

00

96

01

93

US Airways fare

Market from Buffalo to DC

Page 36: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

ConclusionsConclusions

Data rich environment: large databases, and high dimensionality

Data complexity reduction is crucial

Results suggest SOM: summarize well the overall data distribution

capable of detecting clustered structures

can be used to analyze the properties of clustered structures

can be used to study the associations among input variables

Page 37: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Conclusions (Cont.)Conclusions (Cont.)

Interactive visual data mining can: examine subset data more closely

study relationships among interaction types

analyze how detected clusters are distributed in the actual geographic space

Help us gain a better understanding of the factors and spatial processes behind

Page 38: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Future ResearchFuture Research

SOM/VDM analysis DB1BMarket

Other types of spatial interaction data

Data at elemental level

Improved VDM environment Human subject testing

Seemly-coupled

Page 39: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Thank You!Questions? Comments?

Contact: [email protected]

Page 40: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Geographic database fits the profile: massive volume: GIS, GPS, Remote

Sensing …

high dimensionality

Geographic data mining (GDM) and geographic knowledge discovery (GKD)?

Current topic in GIS research

Page 41: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Background (Cont.)Background (Cont.)

Exploratory analysis

Knowledge construction

Analysis and modeling

Evaluation of results

Model driven

Data driven

TimeVisual exploration & visual data mining Visual

knowledge construction & refinement

Visual model tracking,

model steering

Data presentation,

visualization of uncertainty

Exploratory analysis

Knowledge construction

Roles of Visualization

Page 42: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Visualization in KDD ProcessVisualization in KDD Process

Selecting Application Domain

Selecting Target Data

Processing Data

Extracting Information/Knowledge

Interpretation and Evaluation

Understanding basic data distribution, selecting meaningful target datasets

Locating missing data, noise removing, data smoothing

Parameters setting, process tracking, process steering

Interpretation, reporting, comparison, validity checking

Page 43: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Modeling Flows

Spatial interaction models: “Gravity Models”

Other geographic factors: Geographic relationships among

origins?

Geographic relationships among destinations?

Association among types of interaction?

Modeling FlowsModeling Flows

Page 44: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Modeling Flows Spatial interaction models: “Gravity

Models”

Push: origin

Pull: destination

Transportation cost: distance decay

Modeling FlowsModeling Flows

Iij  =  k Pi Pj / dija

     =  k Pi Pj dij -a

Page 45: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Page 46: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Page 47: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Limitations of Traditional Multivariate Limitations of Traditional Multivariate MethodsMethods Data Projection

Factor analysis Projection pursuit Multi-dimensional

scaling Data Quantization

Partitioning methods Hierarchical methods

o Linearityo Stationaryo Normal distributiono Limited data amounto One dimension

compression

o Non-linearo Non-stationaryo Distribution unknowno Sparseo Large data amounto Multi-dimensional

Page 48: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Visualization FormsVisualization Forms

Page 49: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Interaction FormsInteraction Forms

Page 50: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Interaction FormsInteraction Forms

Page 51: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Data DistributionData Distribution

1. Similar data distributions

2. But greatly reduced number of low values

3. SOM prototype represents original data well

1. Similar data distributions

2. But greatly reduced number of low values

3. SOM prototype represents original data well

Page 52: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

Markets with Southwest Market Share >= 50%

Markets Represented by Cluster 9-2

Cluster 9-2 Markets with Southwest Market Share >= 20%

Page 53: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Changes Temporal Changes (Cont.)(Cont.)

US Airways share

American share

Page 54: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Changes Temporal Changes (Cont.)(Cont.)

Delta shareUnited share

Page 55: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Changes (Cont.)Temporal Changes (Cont.)

Page 56: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Trend: Temporal Trend: Trajectory Trajectory (Cont.)(Cont.) Market from Buffalo to NYC

US Airways share

93

96

00

01

JetBlue share

93

96

00

01

US Airways fare

93

96

00

01

Page 57: Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial Interaction With Self-Organizing Maps Ph.D. Dissertation

Temporal Trend: Temporal Trend: Trajectory Trajectory (Cont.)(Cont.) Market from Buffalo to Atlanta

93

98

Airtran Airways share

Delta share

93

98

Delta fare

93

98