abstractions for shared sensor networks
DESCRIPTION
Abstractions for Shared Sensor Networks. Michael J. Franklin. DMSN September 2006. Outline. Perspective on shared infrastructure Scientific Applications Business Environments Data Cleaning as a Shared Service Other Core Services What’s core and what isn’t? Conclusions. - PowerPoint PPT PresentationTRANSCRIPT
Abstractions for Shared Abstractions for Shared Sensor NetworksSensor Networks
DMSNSeptember 2006
Michael J. Franklin
Mike FranklinUC Berkeley EECS
OutlineOutline
• Perspective on shared infrastructure• Scientific Applications• Business Environments
• Data Cleaning as a Shared Service• Other Core Services
• What’s core and what isn’t?• Conclusions
Mike FranklinUC Berkeley EECS
Scientific Scientific InstrumentsInstruments
Cost: moderateUsers: oneUse: defense/navigationScheduling: ad hocData Cleaning: cloth
Mike FranklinUC Berkeley EECS
Scientific Scientific InstrumentsInstruments
Cost: moreUsers: oneUse: scienceScheduling: ad hocData Cleaning: religious
Mike FranklinUC Berkeley EECS
Scientific Scientific InstrumentsInstruments
Cost: 100’s K$ (1880s $)
Users: 100’sUse: scienceScheduling: by committeeData Cleaning: grad students
Mike FranklinUC Berkeley EECS
Scientific Scientific InstrumentsInstruments
Cost: 100’s M$ (2010s $)Users: 1000’s-millionsUse: science and educationScheduling: mostly static - SURVEYData cleaning: mostly algorithmic
Key Point: Enabled by modern (future) Data Management!
Mike FranklinUC Berkeley EECS
Shared InfrastructureShared Infrastructure• Sharing dictated by costs
• Costs of hardware• Costs of deployment• Costs of maintenance
• Pooled Resource Management• Comptetitively Scheduled• Statically Scheduled (surveys)
• Data Cleaning• At the instrument• By the applications (or end users)
• Other Services
Mike FranklinUC Berkeley EECS
Shared Sensor NetsShared Sensor Nets• Macroscopes are expensive:
• to design• to build• to deploy• to operate and maintainThey will be shared resources:
- across organizations- across apps w/in organizations
Q: What are the right abstractions to support them?
Mike FranklinUC Berkeley EECS
Traditional Shared Data Traditional Shared Data MgmtMgmt
Point of Sale
Inventory
Data Feeds
Data Warehouse
Business Intelligence
Etc.
Reports
ExtractTransform
Load Data Mart
Data Mart
Dashboards
OperationalSystems
ad hocQueries
Cleaning,Auditing,…
UsersAll users/apps see only cleaned
data:a.k.a. “TRUTH”
Mike FranklinUC Berkeley EECS
Shared SensorNet Shared SensorNet ServicesServices
Data Cleanin
gScheduling
Monitoring
Actuation
Tasking/ Programming
Evolution
Provisioning
We will need to understand the shared/customtradeoffs for all of these.
Quality Estimatio
n
DataCollection
Query & Reporting
Mike FranklinUC Berkeley EECS
Data Cleaning as a Shared Service
Mike FranklinUC Berkeley EECS
Some Data Quality Some Data Quality Problems with SensorsProblems with Sensors1. (Cheap) sensors are failure and error
prone (and people want their sensors to be really cheap).2. Device interface is too low level for
applications.3. They produce too much (uninteresting)
data.4. They produce some interesting data, and
it’s hard to tell case #3 from case #4.5. Sensitive to environmental conditions.
Mike FranklinUC Berkeley EECS
Problem 1a: Sensors are Problem 1a: Sensors are NoisyNoisy
• A simple RFID Experiment
• 2 adjacent shelves, 6 ft. wide
• 10 EPC-tagged items each, plus 5 moved between them
• RFID antenna on each shelf
Mike FranklinUC Berkeley EECS
Shelf RIFD Test - Shelf RIFD Test - Ground TruthGround Truth
Mike FranklinUC Berkeley EECS
Actual RFID Actual RFID ReadingsReadings
“Restock every time inventory goes below 5”
Mike FranklinUC Berkeley EECS
Prob 1b: Sensors “Fail Prob 1b: Sensors “Fail Dirty”Dirty”• 3 temperature-sensing motes in the same
room
Outlier Mote
Average
Mike FranklinUC Berkeley EECS
Problem 2: Low-Problem 2: Low-level Interfacelevel Interface
Lack of good support for devices increases thecomplexity of sensor-based applications.
Mike FranklinUC Berkeley EECS
Problems 3 and 4: Problems 3 and 4: The Wheat from the The Wheat from the ChaffChaff
Shelf RFID reports (50 times/sec):• there are 100 items on the shelf
• the 100 items are still on the shelf• the 100 items are still on the shelf
• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf
• the 100 items are still on the shelf• the 100 items are still on the shelf
• the 100 items are still on the shelf • the 100 items are still on the shelf• the 100 items are still on the shelf
• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf • the 100 items are still on the shelf
• the 100 items are still on the shelf• there are 99 items on the shelf
• the 99 items are still on the shelf
Mike FranklinUC Berkeley EECS
Problem 5: Problem 5: EnvironmentEnvironment
Read Rate vs. Distance Alien I2 Tag in a room on the 4th floor of Soda Hall
Read Rate vs. Distance using same reader and tag in the room next door
Mike FranklinUC Berkeley EECS
VICE:VICE: Virtual Device Virtual Device InterfaceInterface [Jeffery et al., Pervasive [Jeffery et al., Pervasive 2006]2006]
• Goal: Hide messy details of underlying physical devices.• Error characteristics• Failure• Calibration• Sampling Issues• Device Management• Physical vs. Virtual
• Fundamental abstractions:• Spatial & temporal
granules
“Metaphysical Data Independence”
Mike FranklinUC Berkeley EECS
VICE - A Virtual Device VICE - A Virtual Device LayerLayer
RFIDRFID
“Virtual Device(VICE)API”
Vice API is a natural placeto hide much of the complexity arising from physical devices.
Mike FranklinUC Berkeley EECS
The VICE The VICE QueryQuery PipelinePipeline
Multiple Receptors
Single Tuple
Window
Vice Stages
Generalization
Arbitrate
Clean
Smooth
Validate
Analyze
Join w/Stored Data
On-line Data Mining
Mike FranklinUC Berkeley EECS
RFID Smoothing RFID Smoothing w/Queriesw/Queries
Time
Raw readings
Smoothed output
• RFID data has many dropped readings• Typically, use a smoothing filter to interpolateSELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id
Smoothing Filter
Mike FranklinUC Berkeley EECS
After Vice After Vice ProcessingProcessing
“Restock every time inventory goes below 5”
Mike FranklinUC Berkeley EECS
Adaptive SmoothingAdaptive Smoothing[Jeffery et al. VLDB 2006][Jeffery et al. VLDB 2006]
Mike FranklinUC Berkeley EECS
Ongoing Work: Spatial Ongoing Work: Spatial SmoothingSmoothing
• With multiple readers, more complicated
Reinforcement
A? B? A U B? A B?
Arbitration
A? C? All are addressed by statistical framework!
U
A
B
C
D
Two rooms, two readers per room
Mike FranklinUC Berkeley EECS
Problems with a Problems with a single Truthsingle Truth
• If you knew what was going to happen, you wouldn’t need sensors• upside down airplane• ozone layer hole
• Monitoring vs. Needle-in-a-haystack
• Probability-based smoothing may remove unlikely, but real events!
Mike FranklinUC Berkeley EECS
Risks of too little Risks of too little cleaningcleaning
• GIGO• Complexity- Burden on App
Developers• Efficiency (repeated work)• Too much opportunity for error
Mike FranklinUC Berkeley EECS
Risks of too much Risks of too much cleaningcleaningThe appearance of a hole in the earth's ozone layer over Antarctica, first detected in 1976, was so unexpected that scientists didn't pay attention to what their instruments were telling them; they thought their instruments were malfunctioning.
National Center for Atmospheric Research
In fact, the data were rejected as unreasonable by data quality control algorithms
Mike FranklinUC Berkeley EECS
One Truth for Sensor One Truth for Sensor Nets?Nets?• How clean is “clean-enough”?• How much cleaning is too much?• Answers are likely to be:
• domain-specific• sensor-specific• application-specific• user-specific• all of the above?How to split between shared and
application-specific cleaning?
Mike FranklinUC Berkeley EECS
Fuzzy TruthFuzzy TruthOne solution is to make the
shared interface richer.Probabilistic Data Management is
also the key to “Calm Computing”
Mike FranklinUC Berkeley EECS
Adding Quality AssessmentAdding Quality Assessment
A. Das Sarma, S. Jeffery, M. Franklin, J. Widom, “Estimating Data Stream Quality for Object-Detection Applications”, 3rd Intl ACM SIGMOD Workshop on Information Quality in Info Sys, 2006
Mike FranklinUC Berkeley EECS
‘‘Data Furnace” Data Furnace” ArchitectureArchitecture
Service Layer•Probabilistic Reasoning•Uncertainty Management•Data Model Learning•Complex Event Processing•Data Archiving and Streaming
Garafalakis et al.D.E. Bulletin, 3/06
Mike FranklinUC Berkeley EECS
Rethinking Service Rethinking Service AbstractionsAbstractions
Data Cleanin
gScheduling
Monitoring
Actuation
Tasking/ Programming Evolution
Provisioning
We will need to understand the shared/customtradeoffs for all of these.
Quality Estimatio
n
Query-DataCollection
Mike FranklinUC Berkeley EECS
ConclusionsConclusions• Much current sensor research is focused on the “single user” or “single app” model.
•Sensor networks will be shared resources.
•Can leverage some ideas from current shared Data Management infrastructures.
•But, new solutions, abstractions, and architectures will be required.