bolt: data management for connected homes -...
TRANSCRIPT
Bolt: Data management for connected homes
Trinabh Gupta*, Rayman Preet Singh† Amar Phanishayee, Jaeyeon Jung, Ratul Mahajan
*The University of Texas at AusBn
†University of Waterloo MicrosoE Research
Wearables and Personal devices
Sensors in commercial spaces
AutomoBve sensors Sensors and devices for home automaBon
Number of sensors, smart devices is growing
In 2008, number of sensors exceeded people
In 2020, 50 billion sensors.
hMp://share.cisco.com/internet-‐of-‐things.html
In 2017, 90 million homes with automaBon
hMps://www.abiresearch.com/press/90-‐million-‐homes-‐worldwide-‐will-‐employ-‐home-‐automa
Devices and sensors
for the home
HomeOS Mi Casa Verde Pla3orms
PreHeat DigiSwitch Energy Data AnalyBcs
Neighborhood Watch
Apps [Ubicomp 2011] [Medical Systems
2011] [Energy and Building 2012] [CSCW 2013]
Need a new data management system for connected homes
Slot 1
Day 1
Occupancy Sensors
Thermostat
Requirement: Support Bme-‐series data
Slot 2 . . .
Slot 96
Slot 1
Day 2 Slot 2
. . . Slot 96
Slot 1
Day 3 Slot 2
?
Applications generate time-series data and retrieve based on time windows
PreHeat IdenBfy days with closest occupancy paMern (least hamming distance) to predict future slot
Requirement: Leverage cloud servers for availability
Energy Data AnalyBcs
Applications access data from multiple homes
Energy Meter
Analyze and compare energy usage
Avg. for this home
Avg. of neighboring homes
Time AMributes Value Mon, 1 AM Temp = 20°C 0.9 Mon, 2 AM Temp = 20°C 1.1 Mon, 3 AM Temp = 22°C 1.2 Mon, 4 AM Temp = 22°C 1.2
Data from neighboring energy meters
Perform analysis even when homes are offline
Run by uBlity company
Time AMributes Value Sun, 11 AM humans Mon, 1 PM animal Tue, 3 PM car, red
Neighbor Neighbor
Requirement: Ensure confidenBality, integrity
Applications share sensitive home data
Seen a car in the last 24 hours?
Perform image similarity matching
DNW
• Support Bme-‐series data with efficient Bme and tags based retrieval
• Leverage reliable and available cloud storage to facilitate sharing
• Ensure data confidenBality and integrity
Recap of data management requirements
Existing systems are not suitable
Time series data processing [OpenTSDB]
Secure systems using untrusted storage [SUNDR 04, Depot 10, SPORC 10]
• Do not maintain confiden8ality or integrity of data
• Do not support 8me-‐series data
Query (start-‐Bme, end-‐Bme, …)
Exposes key-‐value API
App
App
Outline • ApplicaBons requirements and moBvaBon
• Design of Bolt • Key mechanisms to support requirements
• EvaluaBon • Feasibility of using Bolt for three applicaBons
Recall the data management requirements of apps for connected homes
How can we address these requirements simultaneously?
Support Bme-‐series, tagged data
Leverage cloud storage
Ensure data confidenBality,
integrity
Straw man: Store data in a cloud DB
• Cloud untrusted for data confidenBality and integrity • Cloud untrusted for computaBons (e.g., hamming distance, image similarity)
Query (start-‐Bme, end-‐Bme, …)
DB
App
Design guidelines: 1. End-‐points perform: encryp8on/decryp8on, data
integrity checking, query evalua8on 2. Use cloud providers for (just) storage
Straw man: Using secure key-value datastores
• Need support for temporal queries. • High per-‐data-‐record overhead.
• EncrypBon/decrypBon, integrity metadata / checks • Remote storage calls and transfers
• Individual data records do not compress well.
Record 1
Record 2
Record 3
Data records
Record 4
Block store (key-‐value API)
[SUNDR 04, Depot 10, SPORC 10] Logic for security
App
Design guideline: Batch con8guous data records, leverage workload query paKern
Overview of Bolt • Stream (append-‐only) abstracBon
• Records: <Bmestamp, [tag], value>
• Query (start-‐Bme, end-‐Bme, tag)
• Leverage cloud storage • Cloud resources untrusted for compute and storage • No cloud query engine with computaBon at endpoints
• Security and privacy guarantees • ConfidenBality, Tamper evidence, Freshness
Bolt Stream: Index + Log of <ts, tag, val>
ts1, tag1, val
ts2, tag1, val
ts3, tag2, val
t7, tag1, val
ts8, tag1, val
Stream Log (Disk)
ts5, tag2, val
ts6, tag2, val
ts4, tag1, val
tag1 ts1, O1 ts2, O2
ts3, O3 ts5, O5
ts4, O4
tag2
Stream Index (In memory, Disk backed)
ts7, O7 ts8, O8
ts6, O6
App
Tag Offsets Sorted by 8me
Remote Log & Index Compressed Encrypted
data
Compressed Encrypted
data
Compressed Encrypted
data
Batching data for efficiency ts, tag, val
ts, tag, val
ts, tag, val
ts, tag, val
ts, tag, val
Compressed Encrypted
data
Compressed Encrypted
Chun
k 1
Chun
k 2
Chun
k 3
Chun
k 1
Chun
k 2
Chun
k 3
Stream Log and Index
Improves storage and transfer efficiency. AmorBzes cost of compression, encrypBon, and hashing
ts, tag, val
ts, tag, val
ts, tag, val Compressed Encrypted
data
Index Encrypted Index Encrypted Index
Hash1
Hash2
Hash3
Hash
Inde
x
Signed Hash of
Hash1, Hash2, Hash3 &
Index Hash
Metadata
App
Reads use the index to download chunks
Temp = 22 1AM, O1 3AM, O3
2AM, O2 5AM, O5
4AM, O4
Temp = 24
Tag Offsets Sorted by 8me Index opBmized to lookup tags, Bmestamps
Stream Log
Compressed Encrypted
data
Compressed Encrypted
data
Compressed Encrypted
data
Chun
k 1
Chun
k 2
Chun
k 3
Encrypted Index Query (3 AM to 5AM, Temp = 22)
Lookups and computaBon are performed locally at home
Compressed Encrypted Chunk 2
Compressed Encrypted Chunk 3
App
Stream Log
Compressed Encrypted
data
Compressed Encrypted
data
Compressed Encrypted
data
Chun
k 1
Chun
k 2
Chun
k 3
Reduces number of remote calls, pre-‐fetches data for subsequent queries.
Query paMerns • Fixed Window • Sliding Window • Growing Window
Batching and prefetching on reads
App
Grant/Revoke Access
Secure sharing: Decentralized access control
Key Server
Owner Reader
Stream Name Key Info
Home1/App1 Enc(App1, Key-‐1) Enc(App2, Key-‐1)
Enc(App1, Key-‐2)
…
Enc(App1, Key-‐N)
App1 App2
Lazy revoca8on: [Cepheus 09]
• PotenBally many encrypBon keys per stream. SoluBon: Hash-‐based key regression [Fu et al. NDSS 06]
• Key server trusted to maintain principal -‐> public key mappings.
• Key server trusted to prevent rollback of key. Possible soluBon: Replicated key server
Addressing challenges in decentralized access control
Outline • ApplicaBons requirements and moBvaBon
• Design and key mechanisms of Bolt • Chunking • SeparaBon of Index from data • Decentralized access control • SegmentaBon for memory efficiency, key change (paper)
• EvaluaBon • Feasibility of using Bolt for real-‐world applicaBons
Implementation
• Integrated with HomeOS • laboEhings.codeplex.com
• Supports Windows Azure and Amazon S3
• Integrated Bolt with 5 applicaBons • 2 of these done by other developers • In use by HCI Researchers at MSR and Univ. of Michigan
What are the overheads in Bolt?
• Baseline: Flat file • No support for temporal range queries, security
• Experiment to understand • Query Bme breakup • Storage overhead
Overheads in Bolt
• Lookup during queries has < 1% overhead. • EncrypBon, hashing overhead is negligible. • Index storage adds
• 30% for datavalue sizes of 10 bytes • < 1% for datavalue sizes of 1KB
Serialize, write data
Lookup, Update index
compress, hash encrypt, chunks
Upload data, index
Append (Temp = 22, Val = 0.7)
~ 20 % ~ 75 % < 1% < 1 %
Refer to paper for detailed microbenchmarks
Analyze, compare energy usage
Energy Data AnalyBcs
Time AMributes Value Mon, 1 AM Temp = 20°C 0.9 Mon, 2 AM Temp = 20°C 1.1
… … … Tue, 4 AM Temp = 22°C 1.2
Energy Data Analytics
Energy Meter
Measure Bme taken to compare energy usage during last 30 days
Energy reading for last 30 days when external temperature was X°C
Avg. for this home
Avg. of neighboring homes
Current query retrieves data for subsequent query’s temperature values
Prefetching in chunks improves query latency
1.67
14
145
11
114
1129
1
10
100
1000
10000
1 10 100
Retrieval Tim
e (Secon
ds)
Number of homes
Bolt OpenTSDB
Time AMributes Value Sun, 11 AM humans Mon, 1 PM animal Tue, 3 PM car, red
Neighbor Neighbor
Measure query Bme across 10 homes looking at data from last 10 hours
Applications share sensitive home data
Seen a car in the last 24 hours?
Perform image similarity matching
DNW
Batching data in chunks improves query latency
Larger chunks result in fewer remote calls & RTTs.
0
1
2
3
4
5
6
7
100KB 1MB
Retrieval Tim
e (Secon
ds)
Chunk Size
Bolt
OpenTSDB Bolt
OpenTSDB
Bolt’s data storage efficiency
Bolt is 3-‐5x more space efficient than OpenTSDB.
Bolt OpenTSDB Preheat 1.5 8.2 DNW 37.9 212.4 EDA 4.6 14.4
Data in MBs
Summary • Emerging class of applicaBons for smart homes with a new set of data management requirements.
• Bolt addresses these efficiently by leveraging the nature of queries in this domain.
• Despite providing more than OpenTSDB (security guarantees), Bolt is up to 40x faster while requiring 3–5x less storage space.
Code: labo'hings.codeplex.com
Extra Slides
Compression for time-series data
0
1
2
3
4
5
6
7
10 100 1000 10000 100000 1000000
Compression
RaF
o
Chunk Size (in Bytes)
Home energy consumption - 1 month
Gzip
BZip2
Bolt’s Stream based API • Stream Types
• ValueDataStream (small values like temperature readings) • FileDataStream (large values like images and videos)
• OpenStream(StreamName, R/W, Policy) • StreamName: <HomeID, AppID, StreamID> • Policy: LocaBon (local, remote), Security (plain, encrypted)
• Append (tag, value) • Append (tag[], value)
• Get (tag) • Get (tag, start-‐Bme, end-‐Bme) • Get (tag, start-‐Bme, end-‐Bme, skip-‐interval)
Temporal range queries Sampling
Example: Tag = Temperature (e.g. 30 C)
Value = energy reading (e.g. kWh) Timestamp assigned on Append
Single Writer / Owner MulFple Readers
Scalability of segments
Efficient memory indices
Slot 1
Day 1
Occupancy Sensors
Thermostat
Slot 2 . . .
Slot 96
Slot 1
Day N
Slot 2
PreHeat: Occupancy prediction for efficient heating
PreHeat
. . . Slot 96 = ?
Predict Slot 96
Measure Bme to predict 96th slot
Slot 1
Day 2 Slot 2
. . . Slot 96
Bolt compatible with Preheat and easily meets its performance needs
Bolt is 40x faster than OpenTSDB.
1
10
100
1000
10000
1 2 4 8 16 32 64 128
Time to re
trieve data (m
s)
Length of Preheat Deployment (in days)
Bolt
OpenTSDB
Connected things everywhere
hMp://blogs.cisco.com/news/the-‐internet-‐of-‐things-‐infographic/
Current trend
Home AutomaBon, Security, and Monitoring, ABI Research, 2012
Shipment of home automaBon systems
0
5,000
10,000
15,000
20,000
25,000
2011 2012 2013 2014 2015 2016 2017
(000
s)Middle East / AfricaLatin America
Asia Pacific
ICONS
Devices and sensors
for the home
HomeOS Mi Casa Verde Pla3orms
PreHeat DigiSwitch Energy Data AnalyBcs
Neighborhood Watch
Apps
Need a new data management system for connected homes
[Ubicomp 2011] [Medical Systems 2011] [Energy and
Building 2012] [CSCW 2013]
Wasted data downloaded via chunking
Large chunks potenBally download data that is not useful to subsequent queries in DNW
26
46
20
30
40
50
60
70
80
90
100
100KB 1MB
% Retrie
ved Ch
unk Da
ta Unu
sed
Chunk Size
Energy Data AnalyBcs
Requires storing data on durable, available, elasBc storage.
360MB per sensor per
year MulBple
sensors per home Queries span
mulBple years
Time AMributes Value Mon, 1 AM Temp = 20°C 0.9 Mon, 2 AM Temp = 20°C 1.1 Mon, 3 AM Temp = 22°C 1.2 Mon, 4 AM Temp = 22°C 1.2
EDA generates large amounts of data
Energy Meter