exascale computing and experimental sensor data
Post on 05-Dec-2014
152 Views
Preview:
DESCRIPTION
TRANSCRIPT
Exascale Computing and Experimental Sensor Data
Overview given at Brookhaven National LaboratoryApril 18 2014
Joel Saltz Stony Brook University
joel.saltz@stonybrook.edu
Integrate Information from Sensors, Images, Cameras
• Multi-dimensional spatial-temporal datasets– Radiology and Microscopy Image Analyses– Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution
Remediation– Biomass monitoring and disaster surveillance using multiple types of satellite
imagery– Weather prediction using satellite and ground sensor data– Analysis of Results from Large Scale Simulations– Square Kilometer Array– Google Self Driving Car
• Correlative and cooperative analysis of data from multiple sensor modalities and sources
• Equivalent from standpoint of data access patterns – need to develop new generation of data skeletons/mini-apps/data dwarfs
Spatio-temporal Sensor Integration, Analysis, Classification
• Multi-scale material/tissue structural, molecular, functional characterization. Design of materials with specific structural, energy storage properties, brain, regenerative medicine, cancer
• Integrative multi-scale analyses of the earth, oceans, atmosphere, cities, vegetation etc – cameras and sensors on satellites, aircraft, drones, land vehicles, stationary cameras
• Digital astronomy • Hydrocarbon exploration, exploitation, pollution remediation• Aerospace – wind tunnels, acquisition of data during flight• Solid printing integrative data analyses• Autonomous vehicles, e.g. self driving cars• Data generated by numerical simulation codes – PDEs, particle methods• Fit model with data
Typical Computational/Analysis Tasks Spatio-temporal Sensor Integration, Analysis, Classification
• Data Cleaning and Low Level Transformations• Data Subsetting, Filtering, Subsampling• Spatio-temporal Mapping and Registration• Object Segmentation • Feature Extraction• Object/Region/Feature Classification• Spatio-temporal Aggregation• Diffeomorphism type mapping methods (e.g. optimal
mass transport)• Particle filtering/prediction• Change Detection, Comparison, and Quantification
Detect and track changes in data during production
Invert data for reservoir propertiesDetect and track reservoir changes
Assimilate data & reservoir properties into the evolving reservoir model
Use simulation and optimization to guide future production
Coupled data acquisition, data analysis, modeling, prediction and correction – data assimilation, particle filtering etc.
Future State
• 100K – 1M pathology slides/hospital/year• 2GB compressed per slide• 1-10 slides used for Pathologist computer
aided diagnosis• 100-10K slides used in hospital Quality control• Groups of 100K+ slides used for clinical
research studies -- Combined with molecular, outcome data
Cent
er f
or C
ompr
ehen
sive
Inf
orma
tics
Brain Tumor Pipeline Scaling on GT/ORNL NSF Keeneland (100 Nodes)
Cent
er f
or C
ompr
ehen
sive
Inf
orma
tics
Runtime Support Objectives
• Coordinated mapping of data and computation to complex memory hierarchies
• Hierarchical work assignment with flexibility capable of dealing with data dependent computational patterns, fluctuations in computational speed associated with power management, faults
• Linked to comprehensible programming model – model targeted at abstract application class but not to application domain (In the sensor, image, camera case -- Region Templates)
• Software stack including coordinated compiler/runtime support/autotuning frameworks
HPC Segmentation and Feature Extraction Pipeline
Tony Pan, George Teodoro,Tahsin Kurc and Scott Klasky
Region Templates• Provides a generic container template for common data structures, such as
points, arrays, regions, and object sets, within a spatial and temporal bounding box
• Data region object is a storage materialization of data types and stores the data elements in the region contained by a region template instance; region template instance may have multiple data regions.
• Allows for different data I/O, storage, and management strategies and implementations, while providing a homogeneous, unified interface to the application developer.
• Application operations interact with data regions and region templates to store and retrieve data elements, rather than explicitly handling the management, staging, and distribution of the data elements.
• Current implementations on nodes with multi-core CPUs and GPUs, distributed memory storage, and high bandwidth disk I/O.
Region Template: Preliminary Experimental Evaluation
• Experimentally evaluated using pathology image analysis on the Keeneland system
• This application consists of a pipeline with Segmentation and Feature Computation Stages, and each of these stages are internally divided into finer-grained tasks for better scheduling on heterogeneous CPU-GPU equipped machines.
Cent
er f
or C
ompr
ehen
sive
Inf
orma
tics
Large Scale Data Management
Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
Highly optimized spatial query and analyses Implemented in a variety of ways including optimized CPU/GPU, Hadoop/HDFS and IBM DB2
Supported by two NLM R01 grants – Saltz/Foran
Cent
er f
or C
ompr
ehen
sive
Inf
orma
tics
Spatial Centric – Sensor Data Feature “GIS”
Point query: human marked point inside a nucleus
.
Window query: return markups contained in a rectangle
Spatial join query: algorithm validation/comparison
Containment query: nuclear featureaggregation in tumor regions
Fusheng Wang
Cent
er f
or C
ompr
ehen
sive
Inf
orma
tics
Algorithm Validation: Intersection between Two Result Sets (Spatial Join)
PAIS: Example Queries
. .
AIS (Analytical Imaging Standards)
AIS Logical Model 62 UML classes
markups, annotations, imageReferences, provenance
AIS Data Representation XML (compressed) or HDF5
AIS Databases loading, managing and
querying and sharing data Native XML DBMS or
RDBMS + SDBMS
class Domain Mo...
Annotation
GeometricShape
CalculationObservation
Specimen
ImageReference
Provenance
User
PAIS
EquipmentGroup
AnatomicEntity
Subject
Field
Project
MicroscopyImageReference
DICOMImageReference
TMAImageReference
Markup
Inference
Region
WholeSlideImageReferencePatient
Surface
Collection
AnnotationReference
10..1
1
0..1
0..*
0..*
1
0..*1
0..11 0..*
1
0..1
10..1
10..1
10..*
1
0..*
0..*
0..*
1 0..11
0..1
1
0..*
0..1
0..*
1
0..*
1
0..1
1
0..*
10..1
10..1
1
0..*
10..*
1 0..*
1
0..*
PAIS
Cent
er f
or C
ompr
ehen
sive
Inf
orma
tics
VLDB 2012, 2013
Spatial Query, Change Detection, Comparison, and Quantification
Soft real time and streaming Sensor Data Analysis, Event Detection,
Decision Support• Integrated analyses of patient data – physiological
streams, labs, mediations, notes, Radiology, Pathology images, mobile health data feeds
• High frequency trading, arbitrage• Real time monitoring earthquakes, control of oilfields• Control of industrial plants, aircraft engines• Fusion – data capture, control, prediction of
disruptions• Internet of things• Twitter feeds• Intensive care alarms
Typical Computational Analysis Tasks Streaming Sensor Data Analysis, Event Detection, Decision
Support
• Prediction algorithms – Kalman, particle filtering• Machine learning algorithms on aggregated data
to develop model, use of model on streaming data for decision support
• Searching for rare events• Statistical algorithms to distinguish signal from
noise• On the fly integration of multiple complementary
data streams
top related