the spatial scan statistic
DESCRIPTION
National Institute of Statistical Sciences Workshop on Statistics and Counterterrorism G. P. Patil November 20, 2004 New York University. The Spatial Scan Statistic. Move a circular window across the map. Use a variable circle radius, from zero up - PowerPoint PPT PresentationTRANSCRIPT
11
National Institute of Statistical SciencesNational Institute of Statistical Sciences
Workshop on Statistics and CounterterrorismWorkshop on Statistics and Counterterrorism
G. P. PatilG. P. Patil
November 20, 2004November 20, 2004New York UniversityNew York University
2
Spatiallydistributedresponsevariables
Hotspotanalysis
PrioritizationDecisionsupportsystems
Geoinformatic spatio-temporal data from a variety of dataproducts and data sources with agencies, academia, and industry
Masks, filters
Indicators, weights
Masks, filters
Geoinformatic Surveillance System
Spatiallydistributedresponsevariables
Hotspotanalysis
PrioritizationDecisionsupportsystems
Geoinformatic spatio-temporal data from a variety of dataproducts and data sources with agencies, academia, and industry
Masks, filters
Indicators, weights
Masks, filters
Geoinformatic Surveillance System
3
Agency Databases Thematic Databases Other Databases
HomelandSecurity
Disaster Management
Public Health
Ecosystem Health
Other CaseStudies
Statistical Processing: Hotspot Detection, Prioritization, etc.
Data Sharing, Interoperable Middleware
Standard or De Facto Data Model, Data Format, Data Access
Arbitrary Data Model, Data Format, Data Access
Application Specific De Facto Data/Information Standard
Agency Databases Thematic Databases Other Databases
HomelandSecurity
Disaster Management
Public Health
Ecosystem Health
Other CaseStudies
Statistical Processing: Hotspot Detection, Prioritization, etc.
Data Sharing, Interoperable Middleware
Standard or De Facto Data Model, Data Format, Data Access
Arbitrary Data Model, Data Format, Data Access
Application Specific De Facto Data/Information Standard
4
55
The Spatial Scan StatisticThe Spatial Scan Statistic
Move a circular window across the Move a circular window across the map.map.
Use a variable circle radius, from zero Use a variable circle radius, from zero upup
to a maximum where 50 percent of to a maximum where 50 percent of the population is included.the population is included.
6A small sample of the circles used
77
Detecting Emerging Detecting Emerging ClustersClusters Instead of a circular window in two Instead of a circular window in two
dimensions, we use a cylindrical window dimensions, we use a cylindrical window in three dimensions.in three dimensions.
The base of the cylinder represents The base of the cylinder represents space, while the height represents time.space, while the height represents time.
The cylinder is flexible in its circular The cylinder is flexible in its circular base and starting date, but we only base and starting date, but we only consider those cylinders that reach all consider those cylinders that reach all the way to the end of the study period. the way to the end of the study period. Hence, we are only considering ‘alive’ Hence, we are only considering ‘alive’ clusters.clusters.
88
West Nile Virus Surveillance West Nile Virus Surveillance in New York Cityin New York City
2000 Data: Simulation/Testing of 2000 Data: Simulation/Testing of Prospective Surveillance SystemProspective Surveillance System 2001 Data: Real Time 2001 Data: Real Time Implementation of Daily Prospective Implementation of Daily Prospective SurveillanceSurveillance
99
Major epicenter on Staten IslandMajor epicenter on Staten Island
Dead bird surveillance system: June 14Dead bird surveillance system: June 14 Positive bird report: July 16 (coll. July 5)Positive bird report: July 16 (coll. July 5) Positive mosquito trap: July 24 (coll. Positive mosquito trap: July 24 (coll.
July 7)July 7) Human case report: July 28 (onset July Human case report: July 28 (onset July
20)20)
West Nile Virus West Nile Virus Surveillance in New York Surveillance in New York CityCity
10
1111
Hospital Emergency Hospital Emergency Admissions in New York CityAdmissions in New York City
Hospital emergency admissions data from a Hospital emergency admissions data from a majority of New York City hospitals.majority of New York City hospitals.
At midnight, hospitals report last 24 hour of At midnight, hospitals report last 24 hour of
data to New York City Department of Healthdata to New York City Department of Health A spatial scan statistic analysis is performed A spatial scan statistic analysis is performed
every morningevery morning If an alarm, a local investigation is If an alarm, a local investigation is
conductedconducted
1212
Issues
1313
Geospatial Surveillance
1414
Spatial Temporal Surveillance
1515
Syndromic Crisis-Index Surveillance
1616
Hotspot Prioritization
17
1818
National ApplicationsNational Applications
BiosurveillanceBiosurveillance Carbon ManagementCarbon Management Coastal ManagementCoastal Management Community Community
InfrastructureInfrastructure Crop SurveillanceCrop Surveillance Disaster ManagementDisaster Management Disease SurveillanceDisease Surveillance Ecosystem HealthEcosystem Health Environmental JusticeEnvironmental Justice Sensor NetworksSensor Networks Robotic NetworksRobotic Networks
Environmental Environmental ManagementManagement
Environmental PolicyEnvironmental Policy Homeland SecurityHomeland Security Invasive SpeciesInvasive Species Poverty PolicyPoverty Policy Public HealthPublic Health Public Health and Public Health and
EnvironmentEnvironment Syndromic SurveillanceSyndromic Surveillance Social NetworksSocial Networks Stream NetworksStream Networks
1919
Geographic Surveillance and Hotspot Detection for Homeland Security: Cyber Security and Computer Network Diagnostics Securing the nation's computer networks from cyber attack is an important aspect of Homeland Security. Project develops diagnostic tools for detecting security attacks, infrastructure failures, and other operational aberrations of computer networks. Geographic Surveillance and Hotspot Detection for Homeland Security: Tasking of Self-Organizing Surveillance Mobile Sensor Networks Many critical applications of surveillance sensor networks involve finding hotspots. The upper level set scan statistic is used to guide the search by estimating the location of hotspots based on the data previously taken by the surveillance network.Geographic Surveillance and Hotspot Detection for Homeland Security: Drinking Water Quality and Water Utility Vulnerability New York City has installed 892 drinking water sampling stations. Currently, about 47,000 water samples are analyzed annually. The ULS scan statistic will provide a real-time surveillance system for evaluating water quality across the distribution system.Geographic Surveillance and Hotspot Detection for Homeland Security: Surveillance Network and Early Warning Emerging hotspots for disease or biological agents are identified by modeling events at local hospitals. A time-dependent crisis index is determined for each hospital in a network. The crisis index is used for hotspot detection by scan statistic methodsGeographic Surveillance and Hotspot Detection for Homeland Security: West Nile Virus: An Illustration of the Early Warning Capability of the Scan Statistic West Nile virus is a serious mosquito-borne disease. The mosquito vector bites both humans and birds. Scan statistical detection of dead bird clusters provides an early crisis warning and allows targeted public education and increased mosquito control.Geographic Surveillance and Hotspot Detection for Homeland Security: Crop Pathogens and Bioterrorism Disruption of American agriculture and our food system could be catastrophic to the nation's stability. This project has the specific aim of developing novel remote sensing methods and statistical tools for the early detection of crop bioterrorism.Geographic Surveillance and Hotspot Detection for Homeland Security: Disaster Management: Oil Spill Detection, Monitoring, and Prioritization The scan statistic hotspot delineation and poset prioritization tools will be used in combination with our oil spill detection algorithm to provide for early warning and spatial-temporal monitoring of marine oil spills and their consequences.Geographic Surveillance and Hotspot Detection for Homeland Security: Network Analysis of Biological Integrity in Freshwater Streams This study employs the network version of the upper level set scan statistic to characterize biological impairment along the rivers and streams of Pennsylvania and to identify subnetworks that are badly impaired. Center for Statistical Ecology and Environmental Statistics
G. P. Patil, Director
2020
Attractive FeaturesAttractive Features Identifies arbitrarily shaped clustersIdentifies arbitrarily shaped clusters Data-adaptive zonation of candidate hotspotsData-adaptive zonation of candidate hotspots Applicable to data on a networkApplicable to data on a network Provides both a point estimate as well as a confidence set Provides both a point estimate as well as a confidence set
for the hotspotfor the hotspot Uses hotspot-membership rating to map hotspot boundary Uses hotspot-membership rating to map hotspot boundary
uncertaintyuncertainty Computationally efficientComputationally efficient Applicable to both discrete and continuous syndromic Applicable to both discrete and continuous syndromic
responsesresponses Identifies arbitrarily shaped clusters in the spatial-temporal Identifies arbitrarily shaped clusters in the spatial-temporal
domaindomain Provides a typology of space-time hotspots with Provides a typology of space-time hotspots with
discriminatory surveillance potentialdiscriminatory surveillance potential
Hotspot Detection Hotspot Detection InnovationInnovation
Upper Level Set Scan Upper Level Set Scan StatisticStatistic
2121
Candidate Zones for HotspotsCandidate Zones for Hotspots
GoalGoal: Identify geographic : Identify geographic zone(s)zone(s) in which a response is in which a response is significantly elevated relative to the rest of a regionsignificantly elevated relative to the rest of a region
A list of candidate zones A list of candidate zones Z Z is specified is specified a prioria priori– This listThis list becomes part of the parameter space and the becomes part of the parameter space and the
zone must be estimated from within this listzone must be estimated from within this list– Each candidate zone should generally be spatially Each candidate zone should generally be spatially
connected, e.g., a union of contiguous spatial units or connected, e.g., a union of contiguous spatial units or cellscells
– Longer lists of candidate zones are usually preferableLonger lists of candidate zones are usually preferable– Expanding circles or ellipses about specified centers Expanding circles or ellipses about specified centers
are a common method of generating the listare a common method of generating the list
2222
Scan Statistic Zonation for Scan Statistic Zonation for Circles and Space-Time Circles and Space-Time CylindersCylinders
Cholera outbreak along a river flood-plain
•Small c irc les miss much of the outbreak•La rge circles include many unwanted cells
Outbreak e xpanding in time
•Small cy linders miss much of the outbreak•La rge cylinders include many unwanted cells
Space
Tim
eCholera outbreak along a river flood-plain
•Small c irc les miss much of the outbreak•La rge circles include many unwanted cells
Outbreak e xpanding in time
•Small cy linders miss much of the outbreak•La rge cylinders include many unwanted cells
Space
Tim
e
2323
ULS Candidate ULS Candidate ZonesZones
QuestionQuestion:: Are there data-driven (rather than Are there data-driven (rather than a prioria priori) ways of ) ways of selecting the list of candidate zones?selecting the list of candidate zones?
Motivation for the questionMotivation for the question: A human being can look at a map : A human being can look at a map and quickly determine a reasonable set of candidate zones and quickly determine a reasonable set of candidate zones and eliminate many other zones as obviously uninteresting. and eliminate many other zones as obviously uninteresting. Can the computer do the same thing?Can the computer do the same thing?
A data-driven proposalA data-driven proposal: Candidate zones are the connected : Candidate zones are the connected
components of the upper level sets of the response surface. components of the upper level sets of the response surface. The candidate zones have a tree structure (echelon tree is a The candidate zones have a tree structure (echelon tree is a subtree), subtree),
which may assist in automated detection of multiple, but which may assist in automated detection of multiple, but
geographically separate, elevated zones.geographically separate, elevated zones. Null distributionNull distribution: If the list is data-driven (i.e., random), its : If the list is data-driven (i.e., random), its
variability must be accounted for in the null distribution. A variability must be accounted for in the null distribution. A new list must be developed for each simulated data set.new list must be developed for each simulated data set.
2424
Data-adaptive approach to reduced parameter space Data-adaptive approach to reduced parameter space 00
Zones in Zones in 00 are are connected componentsconnected components of of upper level setsupper level sets of the of the
empirical intensity function empirical intensity function GGa a = = YYaa / / AAaa
Upper level set (ULS) at level Upper level set (ULS) at level gg consists of all cells consists of all cells aa where where GGaa gg
Upper level setsUpper level sets may be disconnected. Connected components are may be disconnected. Connected components are
the candidate zones in the candidate zones in 00
These connected components form a rooted tree under set These connected components form a rooted tree under set inclusion. inclusion. – Root node = entire region Root node = entire region R R – Leaf nodes = local maxima of empirical intensity surfaceLeaf nodes = local maxima of empirical intensity surface– Junction nodes occur when connectivity of ULS changes withJunction nodes occur when connectivity of ULS changes with
falling intensity levelfalling intensity level
ULS Scan StatisticULS Scan Statistic
25
Upper Level Set (ULS) of Intensity Surface
Region R
Intensity G
g
1Z 2Z 3Z
Hotspot zones at level g(Connected Components of upper level set)
26
Changing Connectivity of ULS as Level Drops
Region R
Intensity G
g
4Z 5Z 6Z
1Z 2Z 3Z
g
27
ULS Connectivity Tree
Intensity G
g
g4Z 5Z 6Z
1Z 2Z 3Z
Schematicintensity “surface”
N.B. Intensity surface is cellular (piece-wise constant), with only finitely many levelsA, B, C are junction nodes where multiple zones coalesce into a single zone
A
B
C
2828
A confidence set of hotspots on the ULS tree. The A confidence set of hotspots on the ULS tree. The different connected components correspond to different connected components correspond to different hotspot loci while the nodes within a different hotspot loci while the nodes within a connected component correspond to different connected component correspond to different delineations of that hotspotdelineations of that hotspot
MLE
Junction Node
Alternative Hotspot Locus
AlternativeHotspot Delineation
Tessellated Region R
MLE
Junction Node
Alternative Hotspot Locus
AlternativeHotspot Delineation
Tessellated Region R
2929
Network Analysis of Network Analysis of Biological Integrity in Biological Integrity in Freshwater StreamsFreshwater Streams
30
New York CityWater Distribution
Network
31
NYC Drinking Water Quality Within-City Sampling Stations
• 892 sampling stations• Each station about 4.5 feet high and
draws water from a nearby water main• Sampling frequency increased after 9-11
Currently, about 47,000 water samples analyzed annually
• Parameters analyzed: Bacteria Chlorine levels pH Inorganic and organic pollutants Color, turbidity, odor Many others
3232
Network-Based SurveillanceNetwork-Based Surveillance
Subway system surveillanceSubway system surveillance Drinking water distribution system Drinking water distribution system
surveillancesurveillance Stream and river system surveillanceStream and river system surveillance Postal System SurveillancePostal System Surveillance Road transport surveillanceRoad transport surveillance Syndromic SurveillanceSyndromic Surveillance
3333
Syndromic SurveillanceSyndromic Surveillance
Symptoms of disease such as diarrhea, Symptoms of disease such as diarrhea, respiratory problems, headache, etcrespiratory problems, headache, etc
Earlier reporting than diagnosed Earlier reporting than diagnosed diseasedisease
Less specific, more noiseLess specific, more noise
34
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSAa, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSAa, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSA
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSAa, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSA
AdmissionsRecords
AdmissionsRecords
4
3
2
1
a
a
a
a
LSA
2
1
v
v….….Method of
-machines
Formal Language Measure
Crisis Index
High dimensional vector
Reduced dimensional vector
Hospital Behavior Model Crisis Behavior
Model
Symbol Stream
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSAa, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSAa, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSA
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSAa, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
a, P(a|0)b, P(b|0)
a
aa
0
1 2
3 4 5
6 7 8 9
b, P(b|2) a, P(a|2)
b, P(b|3) a, P(a|3)
…… abaaabaaababababaaabab ….
1. Language Samp le 2. Tree of all substrings of length lwith transition probabilities
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
a
a p(a|1),b p(b|1)
a, p(a|0)b, p(b|0)0
12
3. PFSA
AdmissionsRecords
AdmissionsRecords
4
3
2
1
a
a
a
a
LSA
2
1
v
v….….Method of
-machines
Formal Language Measure
Crisis Index
High dimensional vector
Reduced dimensional vector
Hospital Behavior Model Crisis Behavior
Model
Symbol Stream
(left) The overall procedure, leading from admissions records to the crisis index for a hospital. The hotspot detection algorithm is then applied to the crisis index values defined over the hospital network. (right) The -machine procedure for converting an event stream into a parse tree and finally into a probabilistic finite state automaton (PFSA).
Syndromic Surveillance
35
Experimental Validation
Pressure sensitive floor
1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16
17 18 19 20 21 22
23 24 25 26 27
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8 9 10
28 29
Formal Language Events:a – green to red or red to greenb – green to tan or tan to greenc – green to blue or blue to greend – red to tan or tan to rede – blue to red or red to bluef – blue to tan or tan to blue
1
0
2
4 3
8 5
7 6
a
c
f
dc
f
a9
10
1112d
a
fd
cf
a
dc
f
a
d c11
00
2
44 33
88 5
77 66
a
c
f
dc
f
a99
1010
11111212d
a
fd
cf
a
dc
f
a
d c
Counter-ClockwiseClockwise
00 a, b, c, d, e, fWall following Random walk
Analyze String RejectionsTarget Behavior
3636
Emergent Surveillance Plexus Emergent Surveillance Plexus (ESP)(ESP)
Surveillance Sensor Network Testbed Surveillance Sensor Network Testbed Autonomous Ocean Sampling NetworkAutonomous Ocean Sampling Network
Types of HotspotsTypes of Hotspots
Hotspots due to multiple, localized, Hotspots due to multiple, localized, stationary sourcesstationary sources
Hotspots corresponding to areas of Hotspots corresponding to areas of interest in a stationary mapped fieldinterest in a stationary mapped field
Time-dependent, localized hotspotsTime-dependent, localized hotspots
Hotspots due to moving point sourcesHotspots due to moving point sources
37
Ocean SAmpling MObile Network OSAMON
3838
OOceancean SASAmpling mpling MOMObilebile NNetwork etwork OSAMONOSAMON Feedback Feedback LoopLoop
Network sensors gather preliminary dataNetwork sensors gather preliminary data
ULS scan statistic uses available data to ULS scan statistic uses available data to estimate hotspotestimate hotspot
Network controller directs sensor vehicles to Network controller directs sensor vehicles to new locationsnew locations
Updated data is fed into ULS scan statistic Updated data is fed into ULS scan statistic systemsystem
3939
SASAmplingmpling MOMObilebile NNetworksetworks (SAMON)(SAMON) Additional Application ContextsAdditional Application Contexts Hotspots for radioactivity and chemical or biological agents Hotspots for radioactivity and chemical or biological agents
to prevent or mitigate the effects of terrorist attacks or to to prevent or mitigate the effects of terrorist attacks or to detect nuclear testingdetect nuclear testing
Mapping elevation, wind, bathymetry, or ocean currents to Mapping elevation, wind, bathymetry, or ocean currents to better understand and protect the environmentbetter understand and protect the environment
Detecting emerging failures in a complex networked system Detecting emerging failures in a complex networked system like the electric grid, internet, cell phone systems like the electric grid, internet, cell phone systems
Mapping the gravitational field to find underground Mapping the gravitational field to find underground chambers or tunnels for rescue or combat missionschambers or tunnels for rescue or combat missions
40
Mote, Smart Dust: Small, flexible, low-cost sensor node
Sensor DevicesSensor Devices
RF Component of Alcohol Sensor
Miniaturized Spec Node Prototype
Giner’s Transdermal Alcohol Sensor
41
Scalable Wireless Geo-Telemetrywith Miniature Smart Sensors
Geo-telemetry enabled sensor nodes deployed by a UAV into a wireless ad hoc mesh network: Transmitting data and coordinates to TASS and GIS support systems
42
Architectural Block Diagram of Geo-Telemetry Enabled Sensor Node with Mesh Network Capability
43
Standards Based Geo-Processing Model
44
UAV Capable of Aerial Survey
45
Data Fusion Hierarchy for Smart Sensor Network with Scalable Wireless
Geo-Telemetry Capability
4646
Wireless Sensor Networks for Habitat Monitoring
47
Target Tracking in Distributed Sensor Networks
48
Video Surveillance and Data StreamsVideo Surveillance and Data Streams
49
Video Surveillance and Data StreamsVideo Surveillance and Data StreamsTurning Video into InformationTurning Video into Information
Measuring Behavior by SegmentsMeasuring Behavior by Segments
• Customer Intelligence
• Enterprise Intelligence
• Entrance Intelligence
• Media Intelligence
• Video Mining Service
50
Deterministic Finite Automata (DFA)
a
ab
b
b
c
cstart
Directed Graph (loops & multiple edges permitted) such that:• Nodes are called States
• Edges are called Transitions
• Distinguished initial (or starting) state
• Transitions are labeled by symbols from a given finite alphabet, = {a, b, c, . . . }
• The same symbol can label several transitions
• A given symbol can label at most one transition from a given state (deterministic)
51
Deterministic Finite Automata (DFA)Formal Definition
a
ab
b
b
c
cstart
Quadruple (Q, q0 , , ) such that:
• Q is a finite set of states
• is a finite set of symbols, called the alphabet
• q0Q is the initial state
• : Q Q {Blocked} is the transition function:
(q, a) = Blocked if there is no transition from q labeled by a
(q, a) = q' if a is a transition from q to q'
52
DFA and Strings
a
ab
b
b
c
cstart
Any path through the graph starting from the initial state determines a string from the alphabet. Example: The blue dashed path determines the string a b c a
Conversely, any string from the alphabet is either blocked or determines a path through the graph. Example: The following strings are blocked: c, aa, ac, abb, etc. Example: The following strings are not blocked: a, b, ab, bb, etc.
The collection of all unblocked strings is called the language accepted or determined by the DFA (all states are “final” in our approach)
53
Strings and Languages
= (finite) alphabet
* = set of all (finite) strings from
A language is any subset of *.
Not all languages can be determined by a DFA.
Different DFAs can accept the same language
*
* 0 1 2
1
Let ( -fold cartesian product).
consists of all strings of length .
Then, decomposes as
i
i
i
i
i
i
54
Probabilistic Finite Automata (PFA)
A PFA is a DFA (Q, q0 , , ) with a probability attached to each
transition such that the sum of the probabilities across all transitions from a given node is unity.
Formally, p: Q [0, 1] such that
• p(q, a) = 0 if and only if (q, a) = Blocked
• ( , ) 1 for all a
p q a q Q
Multiplying branch probabilities lets us assign a probability value (q0, s) to each string s in *. E.G., (q0, abca)=(.8)1(.6)(.4)=.192
q0
a, .4b, .2
b, 1
b, .5
c, .6
c, .5start
a, .8
55
Properties of (q0, s)• For fixed q0, (q0, s) is a measure on *
• Support of is the language accepted by the DFA
• For fixed q0, (q0, s) is a probability measure on i
( i
= strings of length i )
This probability measure is written as (i).• Given a probability distribution w(i) across string lengths i,
defines a probability measure across *, called the w-weighted probability measure of the PFA. If all w(i) are positive, then the support of is also the language accepted by the underlying DFA.
( )0 0
0
( , ) ( ) ( , )i
i
q s w i q s
56
Distance Between Two PFA
Let A and B be two PFAs on the same alphabet
Let w(i) be a probability distribution across string lengths i
Let A and B be the w-weighted probability measures of A and B
Define the distance between A and B as the variational distance between the probability measures A and B :
d(A, B) = || A B ||
57
Key Crop Areas
Crops
NOAAWeather
ThreatLocations
Plants Infected Non-infected Sentinel
GroundCameras
Air/SpacePlatforms
HyperspectralImagery
SignatureLibrary
DataProcessing
AnomalyReport
Crop Attack Decision Support System
GroundTruthing
Site Identification Module
Signature Development Module
58
Crop Biosurveillance/Biosecurity
59
HyperspectralImagery
SignatureLibrary
Image Segmentation(hyperclustering)
Proxy Signal(per segment)
DiseaseSignature
SimilarityIndex
(per segment)
Tessellation(segmentation)of raster grid
SignatureSimilarity
Map
Hotspot/AnomalyDetection
Crop Biosurveillance/BiosecurityData Processing Module
6060
We also present a prioritization innovation. It lies We also present a prioritization innovation. It lies in the ability for prioritization and ranking of in the ability for prioritization and ranking of hotspots based on multiple indicator and hotspots based on multiple indicator and stakeholder criteria without having to integrate stakeholder criteria without having to integrate indicators into an index, using Hasse diagrams indicators into an index, using Hasse diagrams and partial order sets. This leads us to early and partial order sets. This leads us to early warning systems, and also to the selection of warning systems, and also to the selection of investigational areas.investigational areas.
Prioritization InnovationPrioritization InnovationPartial Order Set RankingPartial Order Set Ranking
61
HUMAN ENVIRONMENT INTERFACELAND, AIR, WATER INDICATORS
RANK COUNTRY LAND AIR WATER
1 Sweden
2 Finland
3 Norway
5 Iceland
13 Austria
22 Switzerland
39 Spain
45 France
47 Germany
51 Portugal
52 Italy
59 Greece
61 Belgium
64 Netherlands
77 Denmark
78 United Kingdom
81 Ireland
69.01
76.46
27.38
1.79
40.57
30.17
32.63
28.34
32.56
34.62
23.35
21.59
21.84
19.43
9.83
12.64
9.25
35.24
19.05
63.98
80.25
29.85
28.10
7.74
6.50
2.10
14.29
6.89
3.20
0.00
1.07
5.04
1.13
1.99
100
98
100
100
100
100
100
100
100
82
100
98
100
100
100
100
100
for land - % of undomesticated land, i.e., total land area-domesticated (permanent crops and pastures, built up areas, roads, etc.)for air - % of renewable energy resources, i.e., hydro, solar, wind, geothermalfor water - % of population with access to safe drinking water
62
1 2 3 4 5 6 7
8 9 10
11
12
13
14
15
16
17
18
19 20
21
22
23
24
25 26 27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 42
43
44
45 46 47 48
49
50
51
52
53
54
55
56
57 58
59
60
61 62 63 64
65
66
67
68
69
70
71
72 73
74 75
76
77 78 79
80
81
82
83 84 85
86
87
88
89 90 91 92
93 94
95
96
97
98 99
100
101
102
103
104
105
106
107
108 109
110
111
112 113
114
115
116
117
118
119
120
121
122
123
124
125 126
127 128
129
130
131
132
133
134
135
136
137
138
139
140 141
Hasse Diagram (all countries)
63
Hasse Diagram(Western Europe)
Norway
Sweden Finland Austria Switz.
Italy Portugal Greece Spain
France
Den.
Ireland
UK Germany
Belgium Neth.
64
Ranking Partially Ordered Sets – 5Linear extension decision tree
a b
dc
e f
a
c
e b
b
d
f f
d
e d
f
e
e
f
c
f
d
e d
f
e
e
f
d
f
e
e
f
c
f
e
e
f
c
c
f
d
e d
f
e
e
f
d
f
e
e
f
c
b a
b
a
d
Jump Size: 1 3 3 2 3 5 4 3 3 2 4 3 4 4 2 2
Poset(Hasse Diagram)
6565
Cumulative Rank Frequency Operator – 5Cumulative Rank Frequency Operator – 5An Example of the ProcedureAn Example of the ProcedureIn the example from the preceding slide, there are a total of 16 linear extensions, giving the following cumulative frequency table.
RankRank
ElemenElementt
11 22 33 44 55 66
a a 99 1414 1616 1616 1616 1616
b b 77 1212 1515 1616 1616 1616
c c 00 44 1010 1616 1616 1616
d d 00 22 66 1212 1616 1616
e e 00 00 11 44 1010 1616
f f 00 00 00 00 66 1616
Each entry gives the number of linear extensions in which the element (row label) receives a rank equal to or better that the column heading
6666
Cumulative Rank Frequency Operator – 6Cumulative Rank Frequency Operator – 6An Example of the ProcedureAn Example of the Procedure
0
4
8
12
16
1 2 3 4 5 6
Rank
Cum
ulat
ive
Fre
quen
cy
abcdef
16
The curves are stacked one above the other and the result is a linear ordering of the elements: a > b > c > d > e > f
67
Cumulative Rank Frequency Operator – 7An example where F must be iterated
Original Poset(Hasse Diagram)
a f
eb
c g d
h
a
f
e
b
ad
c
h
g
a
f
e
b
ad
c
h
g
F F 2
68
Incorporating Judgment Poset Cumulative Rank Frequency Approach
• Certain of the indicators may be deemed more important than the others
• Such differential importance can be accommodated by the poset cumulative rank frequency approach
• Instead of the uniform distribution on the set of linear extensions, we may use an appropriately weighted probability distribution , e.g.,
)()()()( 22110 ppnwnwnww
69
70
71
72
7373
Space-Time Poverty Hotspot Space-Time Poverty Hotspot TypologyTypology
Federal Anti-Poverty Programs have had little Federal Anti-Poverty Programs have had little success in eradicating pockets of persistent success in eradicating pockets of persistent povertypoverty
Can spatial-temporal patterns of poverty Can spatial-temporal patterns of poverty hotspots provide clues to the causes of poverty hotspots provide clues to the causes of poverty and lead to improved location-specific anti-and lead to improved location-specific anti-poverty policy ?poverty policy ?
74
Covariate Adjustment
Known Covariate Effects (age, population size, etc.)
count in cell
Poisson( ) where for cell
Hotspot Hy
numeri
pothesis Te
cal covaria
sting Model
te adjustment
a
a a a a
a
Y a
Y A unknown a
A known
H
relative risk
0
1
: are equal for all cells (constant relative risk)
: take two distinct values, an elevated value in an unknown zone
and a smaller value out
a
a
a
H Z
side
All connected components of upper level sets
List of can
of th a
didate zones (ULS app
e cellular dj suuste rfac
roach)
e /d a a
Z
Z
Y A
75
Covariate AdjustmentGiven Covariates, Unknown Effects
Poisson( ) where for cell
covariate adjustm
GLM Model
ent
vector of covariate values for
a a a a
a
a
Y A unknown a
A unknown
known
relative risk
X
0
cell
vector of covariate effects
Model: log( ) or log( ) lo
Hotspot Hypothesis Testing M
g( )
: are equal for all cells (constant relative
o
i )
l
k
de
r s
T Ta a a a a a a
a
a
unknown
A A
H a
H
β
X β X β
1 : take two distinct values, an elevated value in an unknown zone
and a smaller value outside
List of candidate zones (ULS approach)
All connec
a Z
Z
Z
ted components of upper level sets
of the cellular surface /
Here the model must be fitted under the null hypothesis before
determining the adjustments
adjusted
and the
a a
a
Y A
A candidate zones Z
76
Incorporating Spatial Autocorrelation Ignoring autocorrelation typically results in:
under-assessment of variability over-assessment of significance (H0 rejected too frequently)
How can we account for possible autocorrelation?
GLMM (SAR) Model
Ya = count in cell a Ya distributed as Poisson a = log(E[Ya])
The Ya are conditionally independent given the a
The a are jointly Gaussian with a Simultaneous AutoRegressive (SAR) specification( )a a ab a a a
b
W 2
Here, [ ]
are iid (0, )
is a spatial weight expressing the "degree of association"
between cells and (Take 0 and 1)
Thus, the residual
a a
a
ab
aa a
a
E
N
W
a b W W
for cell is a (by ) weighted average
of the residuals for neighboring cells plus a disturbance term a
a
a deflated
77
Incorporating Spatial Autocorrelation
1
12
2
2
( )
( )
( )
MVN ,
SAR Model:
Matrix Form:
Unknown Parameters:
Special Ca
(
ses:
) ( )
, ,
a a ab a a ab
T
a
W
η μ W η μ ε
η μ I W ε
η μ I W I W
2
0 classical (iid) spatial scan
( is not identifiable here)
0, 0 classical scanoverdispersed
78
Incorporating Spatial Autocorrelation
12
0
Poisson(exp( )) Poisson(exp( )exp( )) Poisson( )
where MVN , (
GLMM (SAR) Model
Hotspot Hypothesis Testing Mo
) ( )
: are equal (to ) for all cells (constan
l
t
de
a a a a a a a
T
a
Y A
H a
η μ I W I W
1
relative risk)
: take two distinct values, an elevated value in an unknown zone
and a smalle
List of candid
r value ou
ate zones
tsi
(ULS
a p
de
p
a
Z
H Z
Z
All connected components of upper level sets
of the cellular surface / where exp( )
Here the model must be fitted under the null hypothesi
r
s
oach
( = ) before
adjus
ted
)
a a a a
a
Y A A
determining the adjustments and the candidate zones aA Z
79
Spatial Autocorrelation Plus Covariates
12
In the SAR model, Poisson(exp( ))
where MVN [ ], ( ) ( ) ,
express the mean of as
[ ]
and formula Hotspot Hypotheste the in terms
of t
is Testing
he co st
M e
n
od l
a a
T
Y
E
E
η η I W I W
η
η μ Xβ
ant term in this expression.μ
80
CAR Model
2 * ** 1
In the CAR model, Poisson(exp( )) where
MVN [ ], and is diagonal( )
However, parameters in CAR and SAR have very different interpretation
.
In
s.
CAR, the conditional va
a aY
E
AIη W Aη
2 *
riances are
Var( | , )
which (strangely) do not depend on the autocorrelation parameter .
In , the conditional variances are
SAR
Var( |
a b aa
a
b a A
2 2 2
2
, ) 1
This expression is intuitively appealing since the conditional variances
are decreasing functions of and are smallest for cells with many
strongly associated neighbors (relati
b babb a W
a
2vely large for many ).baW b
The entire formulation is similar for Conditional AutoRegressive (CAR) specs except that the form of the variance-covariance matrix of is changes.
81
Spatiallydistributedresponsevariables
Hotspotanalysis
PrioritizationDecisionsupportsystems
Geoinformatic spatio-temporal data from a variety of dataproducts and data sources with agencies, academia, and industry
Masks, filters
Indicators, weights
Masks, filters
Geoinformatic Surveillance System
Spatiallydistributedresponsevariables
Hotspotanalysis
PrioritizationDecisionsupportsystems
Geoinformatic spatio-temporal data from a variety of dataproducts and data sources with agencies, academia, and industry
Masks, filters
Indicators, weights
Masks, filters
Geoinformatic Surveillance System
82
Agency Databases Thematic Databases Other Databases
HomelandSecurity
Disaster Management
Public Health
Ecosystem Health
Other CaseStudies
Statistical Processing: Hotspot Detection, Prioritization, etc.
Data Sharing, Interoperable Middleware
Standard or De Facto Data Model, Data Format, Data Access
Arbitrary Data Model, Data Format, Data Access
Application Specific De Facto Data/Information Standard
Agency Databases Thematic Databases Other Databases
HomelandSecurity
Disaster Management
Public Health
Ecosystem Health
Other CaseStudies
Statistical Processing: Hotspot Detection, Prioritization, etc.
Data Sharing, Interoperable Middleware
Standard or De Facto Data Model, Data Format, Data Access
Arbitrary Data Model, Data Format, Data Access
Application Specific De Facto Data/Information Standard