title: spatial data mining in geo-business
DESCRIPTION
Title: Spatial Data Mining in Geo-Business. Overview. Paper available online at www.innovativegis.com/basis/present/GeoTec08/. Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the generation of a customer density surface - PowerPoint PPT PresentationTRANSCRIPT
Title: Spatial Data Mining in Geo-Business
Overview
Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the
generation of a customer density surface Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic
distributions of mapped data Interpolating Spatial Distributions — discusses the basic concepts underlying spatial interpolation Interpreting Interpolation Results — describes the use of “residual analysis” for evaluating spatial interpolation performance Characterizing Data Groups — describes the use of “data distance” to derive similarity among the data patterns in a set of map layersIdentifying Data Zones — describes the use of “level-slicing” for classifying locations with a specified data pattern (data zones)Mapping Data Clusters — describes the use of “clustering”
to identify inherent groupings of similar data patterns Mapping the Future — describes the use of “linear regression” to develop prediction equations relating dependent and
independent map variables Mapping Potential Sales — describes an extensive geo-business application that combines retail competition analysis and
product sales prediction
Paper available online at www.innovativegis.com/basis/present/GeoTec08/
ClassifiedDensityLevels
Classify
Density Map
DensitySurface Totals
Density Surface Analysis
Counts the number of customers (points) within in each grid cell
Customer Street
Address
Customer GIS
Location
CustomerCounts(# per cell)
Geo-Coding Vector to Raster
2D grid display of customer counts
Roving Window
Calculates the total number of customers within
a roving window– customer density
2D perspective display of density contours
3D surface plot
91
Identifying Pockets of High Density
CustomerDensity(Map Surface)
Customer Density
(Non-spatial Statistics)
Unusually High = Mean + 1 Standard Deviation
Grid-based Analysis Frame (Keystone Concept)
Customer Database(non-spatial)
…appends Lat, Lon, Column, Row location to customer records
…GeoCoding plots customers address on the streets map
Vector (point)
Raster (cell)
Analysis Frame …V to R Conversion plots customers location in the analysis frame (grid)
Latitude, Longitude, C, R
Customer Database
(spatial)
Point Samples
Surface Modeling (Spatial Interpolation)
Surface Map
“Spikes ‘n Blanket”
Avg = 42.9
66.3
“Spikes”
66.3
…“maps the variance” by using geographic position to help explain the differences in the sample values.
IDW Interpolation (Inverse Distanced Weighted)
5) Move window to next grid location and repeat
2) Calculate distance from location to data points— Pythagorean Theorem
#11distance = 22.80 #14distance = 26.08 #15distance = 6.32 #16distance = 14.14
3) Weight-average values in the window based on distance to grid location— (1/Distance)2 * Value“closer has more influence”
X
#11#11
#14#14#15#15
#16#16
Sampled Data
1) Identify data points in window—
#11value = 56.9 #14value = 22.5 #15value = 52.3 #16value = 66.3
#16#15#14
#11
x
X
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
4) Assign weight-averaged value— 53.35
Average vs. IDW Interpolated Surface
Average
IDW Surface
RedsAvg>IDW
GreensAvg<IDW
Min = -26.1Max = 29.5
Difference Surface(IDW – Average)
IDW - Average
IDW vs. Krig Interpolated Surfaces
Krig Surface
IDW Surface
Min = -14.8Max = 5.0
Difference Surface(IDW – Krig)
RedsKrig>IDW
GreensKrig<IDW
IDW - Krig
Assessing Relationships Among MapsHousing Density
Home Value
Home Age
(Units/ac)
($K)
(Years)
South hasLower Density
South hasHigher Values
South hasNewer Homes
Geographic Space Data Space
Density
Value
Age
Geographic Space – relative spatial position of measurements
Point #1
Point #2
Data Space – relative numerical magnitude of measurements
Comparison Point #1 D= Low (2.4 units/ac) V= High ($407,000) A= Low (18.3 years)
Least Similar Point #2 D= High (4.8 units/ac) V= Low ($190,000) A= High (51.2 years)
Data Similarity is inversely proportional to Data Distance
…as data distance increases, the map values for two locations are less similar
Assessing Map Similarity
“Data Distance” determines similarity among data patterns
…the farthest away point in data space (least similar) is set 0 and the comparison point is set to 100 —
Data Space
05101520253035404550556065707580859095100
PercentSimilar
Least similar point
Comparison point
Least Similar Point = 4.8, 190, 51.2
Comparison Point = 2.4, 407, 18.3
…all other Data Distances are scaled in terms of their relative similarity as “percent similar” to the comparison point (0 to 100)
Geographic Space
Identifying Data Patterns of InterestHousing Density
Geographic Space Data Space Geographic Space
Mean = 3.56
+StDev = 0.80LevelMin = 4.36
Unusually High
67.2 = -StDev189.8 = LevelMax
257.0 = Mean
Home Value
Unusually Low
Level-Slicing Classifier (two variables)
Data Space
Unusually HighHousing Density
Unusually LowHome Value
Unusually High Density
and Low Value
Geographic Space
Level-Slicing Classifier (three variables)
…common “data zones” can be mapped by identifying specific levels of each mapped variable
then adding the binary maps
Geographic Space
…locates combinations of selected measurements
(high D, low V, high A)
1 + 2 + 4 = 7
(high D, low V but not high A)
1 + 2 + 0 = 3
Data Space
…identifies combinations of selected
measurements
(high D, low V, high A)
Spatial Data Clustering
…“data clusters” are identified as groups of neighboring data points in Data Space, and then mapped as
corresponding grid cells in Geographic Space
Geographic Space …maps common data patterns (clusters)
Relatively high D, low V and high A
Relatively low D, high V and low A
Three Clusters
Four Clusters
TwoClusters
Data Space…plots and identifies groups of similar data values
Spatial Regression (prediction equation)
Low
High
Low
High
HousingDensity
HomeValue
HomeAge
LoanConcentration
…relationship between Loan Concentration and independent variables housing Density, Value and Age
Loan Concentration
vs. Housing Density
Y = 26 -5.7 * Xdensity [R2 = 40%]
V
Loan Concentration
vs. Home Value
Y = -13 +0.074 * Xvalue [R2 = 46%]
V
Loan Concentration
vs. Home Age
Y = 17 - 0.074 * Xage [R2 = 23%]
V
Competition Analysis (Spatial Analysis Steps)
Build travel time maps for entire market area• Compute travel time from every location to our store
• This requires grid-based map analysis software
• Update customer record with travel time to our store
• Add this to every non-customer record in trading area
Step 1
Repeat for every competitor• Update every customer record with travel time to
competitor store
• Add to every non-customer record in trading area
Step 2
Compute Travel Time Gain for travel to main store• Every customer and non-customer record is updated
• The greater gain indicates lower travel effort to visit our store
Step 3
Predictive Modeling (Spatial Statistics Steps)
Build analytic dataset from customer data• Geocoding information
• Transactions, sales, product category purchases
• Visitation frequency, recency, spend
• Customer Segment, travel times, demographics
Step 4
Build predictive models• Probability of Visitation (not possible for this demo)
• Probability of Purchase by Product Category
• Expected Sales and Transactions
• Use store travel time and all competitive differences
Step 5
Map the scores• The distribution of the scores provide visual evidence
of the effects of travel time and competitive pressure
• Spatial hypotheses can be tested and evaluated
Step 6
Map Analysis Framework
Mapping and Geo-query
While discrete sets of points, lines and polygons have served our mapping demands for over 8,000 years and keep us from getting lost…
…the expression of mapped data as continuous spatial distributions (surfaces) provides a new foothold for the contextual and numerical analysis of mapped data—
“Thinking with Maps”
References
Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the
generation of a customer density surface Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic
distributions of mapped data Interpolating Spatial Distributions — discusses the basic concepts underlying spatial interpolation Interpreting Interpolation Results — describes the use of “residual analysis” for evaluating spatial interpolation performance Characterizing Data Groups — describes the use of “data distance” to derive similarity among the data patterns in a set of map layersIdentifying Data Zones — describes the use of “level-slicing” for classifying locations with a specified data pattern (data zones)Mapping Data Clusters — describes the use of “clustering”
to identify inherent groupings of similar data patterns Mapping the Future — describes the use of “linear regression” to develop prediction equations relating dependent and
independent map variables Mapping Potential Sales — describes an extensive geo-business application that combines retail competition analysis and
product sales prediction
Paper available online at www.innovativegis.com/basis/present/GeoTec08/
www.innovativegis.com/basis/present/GeoTec08/
…to download this PowerPoint slide set
Spatial Data Mining in Geo-Business
Weighted Average Calculations for Inverse Distance Weighting (IDW) Spatial Interpolation Technique
Evaluating Interpolation Performance
…Residual Analysis
is used to evaluate interpolation performance
(Krig at .03 Normalized Error is best)
Average IDW Krig