Cloud Storagefor storing sensor data
Jos van der Til
Byzantine fault tolerant
WHAT WAS THAT FIRST SLIDE?
I KNOW SOME OF THESE WORDS!SENSOR DATA
I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL
I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL
VARIES BY DIMENSIONALITY
I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL
VARIES BY DIMENSIONALITY
VARIES BY SIZE
I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL
VARIES BY DIMENSIONALITY
VARIES BY SIZE
IMPORTANT: IMAGES AND VIDEO ARE ALSO SENSOR DATA!
I KNOW SOME OF THESE WORDS!
CLOUD STORAGESENSOR DATA
I KNOW SOME OF THESE WORDS!
CLOUD STORAGESENSOR DATA
I KNOW SOME OF THESE WORDS!
CLOUD STORAGESENSOR DATAUNLIMITED STORAGE
I KNOW SOME OF THESE WORDS!
CLOUD STORAGESENSOR DATA
ACCESSIBLE FROM ANYWHERE
UNLIMITED STORAGE
I KNOW SOME OF THESE WORDS!
CLOUD STORAGESENSOR DATA
ACCESSIBLE FROM ANYWHERE
ACCESSIBLE ANYTIME
UNLIMITED STORAGE
I KNOW SOME OF THESE WORDS!
CLOUD STORAGESENSOR DATA
ACCESSIBLE FROM ANYWHERE
ACCESSIBLE ANYTIME
UNLIMITED STORAGE
PAY FOR WHAT YOU USE!
I KNOW SOME OF THESE WORDS!
CLOUD STORAGEFAULT TOLERANT
SENSOR DATA
I KNOW SOME OF THESE WORDS!
CLOUD STORAGEFAULT TOLERANT
SENSOR DATAPROCESSES ONLY CRASH
I KNOW SOME OF THESE WORDS!
CLOUD STORAGEFAULT TOLERANT
SENSOR DATAPROCESSES ONLY CRASH…RIGHT?
I KNOW SOME OF THESE WORDS!
CLOUD STORAGEFAULT TOLERANT
HOW BAD CAN IT GET? SENSOR DATA
I KNOW SOME OF THESE WORDS!
CLOUD STORAGEFAULT TOLERANT
BYZANTINE
SENSOR DATA
HOW DO PROCESSES FAIL?
HOW DO PROCESSES FAIL?
Fail stop Crash
HOW DO PROCESSES FAIL?
Fail stop Crash
Send OmissionReceive
Omission
General
Omission
HOW DO PROCESSES FAIL?
Fail stop Crash
Send OmissionReceive
Omission
General
Omission
Arbitrary failures
with message
authentication
HOW DO PROCESSES FAIL?
HOW DO PROCESSES FAIL?
Fail stop Crash
Send OmissionReceive
Omission
General
Omission
Arbitrary failures
with message
authentication
Arbitrary
(Byzantine)
failures
Storage clouds
Sensor Network
Measurements
Sensor server
Sto
rage
Lib
Measurements
Storage Lib
Processing server
Writer
Reader
HOW DO PROCESSES FAIL?
READERS ARE PROCESSES
HOW DO PROCESSES FAIL?
READERS ARE PROCESSES
HOW DO PROCESSES FAIL?
WRITERS ARE PROCESSES
READERS ARE PROCESSES
HOW DO PROCESSES FAIL?
WRITERS ARE PROCESSESbut they are cool.
READERS ARE PROCESSES
HOW DO PROCESSES FAIL?
WRITERS ARE PROCESSESbut they are cool.
can fail without causing damage.
are only expected to fail by crashing.
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
but they are NOT cool.
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
but they are NOT cool.
can leak your data
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
but they are NOT cool.
can corrupt your data
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
but they are NOT cool.
can delete your data
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
but they are NOT cool.
can stop responding to your requests
HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES
but they are NOT cool.
HAS FULL CONTROL OVER YOUR DATA BUT BEHAVES BYZANTINE
YOUR DATA IS STORED
IN A PROCESS THAT CAN FAIL
BYZANTINE
HOW TO ACHIEVE
BYZANTINE FAULT TOLERANCE?
DO NOT TRUST A SINGLE CLOUD!
DO TRUST MULTIPLE CLOUDS!
UPLOAD DATA TO ALL THE CLOUDS!
HOW MANY CLOUDS DO WE NEED?
𝑛 ≥ 3𝑓 + 1
HOW IS DATA STORED?
DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT
DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT
DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS
DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT
DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS
ENCRYPTION
SECRET SHARING
DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT
DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS
SHOULD NOT REQUIRE n TIMES THE STORAGE SPACE
DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT
DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS
SHOULD NOT REQUIRE n TIMES THE STORAGE SPACE
ERASURE CODING
DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT
DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS
SHOULD NOT REQUIRE n TIMES THE STORAGE SPACE
ERASURE CODING𝑛
𝑓 + 1
−1
=𝑓 + 1
𝑛
lim𝑓→∞
𝑓 + 1
𝑛=
𝑓 + 1
3𝑓 + 1=1
3
SPACE EFFICIENCY
MAXIMUM SPACE EFFICIENCY
lim𝑓→∞
1
𝑛= 0NOT BAD COMPARED TO
TRADITIONAL APPROACHDATA IS A BLOCK!
TRADITIONAL APPROACHDATA IS A BLOCK!
I WANT TO READ THIS BLOCK
TRADITIONAL APPROACHDATA IS A BLOCK!
I WANT TO ENCRYPT THIS BLOCK
TRADITIONAL APPROACHDATA IS A BLOCK!
I WANT TO HASH THIS BLOCK
TRADITIONAL APPROACHDATA IS A BLOCK!
I WANT TO UPLOAD THIS BLOCK
TRADITIONAL APPROACHDATA IS A BLOCK!
I WANT TO DOWNLOAD THIS BLOCK
TRADITIONAL APPROACHDATA IS A BLOCK!
BLOCK DOES NOT FIT IN MEMORY
TRADITIONAL APPROACHDATA IS A BLOCK!
BLOCK DOES NOT FIT IN MEMORY
:(
NOW WHAT?
MY APPROACHDATA IS A STREAM!
MY APPROACHDATA IS A STREAM…of blocks!
MY APPROACHDATA IS A STREAM…of blocks!
Every block should fit into memory
MY APPROACHDATA IS A STREAM…of blocks!
Every block should fit into memory
Every block is processed independent
MY APPROACHDATA IS A STREAM…of blocks!
Every block should fit into memory
Every block is processed independent
Every block has a checksum (think BitTorrent)
MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE
MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE
MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE
REQUIRES LESS MEMORY FOR PROCESSING
MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE
REQUIRES LESS MEMORY FOR PROCESSING
CAN FAIL FASTER, SAVING BANDWIDTH
WHEN IS THIS USED?
WHEN IS THIS USED?
DATA IS PETABYTE SCALE
DATA IS PETABYTE SCALE
…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR
WHEN IS THIS USED?
DATA IS PETABYTE SCALE
…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR
DATA IS CONTINOUSLY ADDED
WHEN IS THIS USED?
DATA IS PETABYTE SCALE
…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR
DATA IS CONTINOUSLY ADDED
TIME BETWEEN BATCHES IS TOO LONG
WHEN IS THIS USED?
DATA IS PETABYTE SCALE
…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR
DATA IS CONTINOUSLY ADDEDTIME BETWEEN BATCHES IS TOO LONG
KEEPING THOUSANDS OF MACHINES RUNNING IS EXPENSIVE
WHEN IS THIS USED?
DATA IS PETABYTE SCALE
…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR
DATA IS CONTINOUSLY ADDEDTIME BETWEEN BATCHES IS TOO LONG
KEEPING THOUSANDS OF MACHINES RUNNING IS EXPENSIVE
WHAT IF MY HADOOP CLUSTER IS DESTROYED?
WHEN IS THIS USED?
New Data
All data
Streaming cluster
Batch cluster Batch view
Realtime view
ClientQuery
Query
WHY NOT HADOOP?
WHY NOT HADOOP?
HADOOP STILL HAS ITS PLACE
WHY NOT HADOOP?
HADOOP STILL HAS ITS PLACE
JUST NOT FOR STORAGE
New Data
All data
Streaming cluster
Batch cluster Batch view
Realtime view
ClientQuery
Query
OK…WHY NOT HADOOP STORAGE?
OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING
OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING
BUT HADOOP STORAGE IS EXPENSIVE!
OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING
BUT HADOOP STORAGE REQUIRES MAINTENANCE!
OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING
BUT HADOOP STORAGE IS ONLINE EVEN WHEN IDLE!
OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING
BUT HADOOP STORAGE CONSUMES LOTS OF ENERGY!
DOES THIS WORK?
PERFORMANCE
Requests done by 16 threads concurrently
PERFORMANCE
Requests done by 16 threads concurrently
8 core virtual machine
at least 4 GB RAM (but often > 16GB)
PERFORMANCE
Requests done by 16 threads concurrently
8 core virtual machine
At least 4 GB RAM (but often > 16GB)
f = 1, thus n = 4
PERFORMANCE
Requests done by 16 threads concurrently
8 core virtual machine
At least 4 GB RAM (but often > 16GB)
f = 1, thus n = 4
Two implementations:
Streaming DepSky-A
Streaming DepSky-CA
PERFORMANCE
Throughput downstream (per thread):
Filesize 4MB, 750 KB/second (90th percentile)
Filesize 8MB, 1 MB/second (90th percentile)
Throughput upstream (per thread):
Filesize 4MB, 1.2 MB/second (90th percentile)
Filesize 8MB, 1.7 MB/second (90th percentile)
0.9997
0.9998
0.9999
1.0000
0 5 10 15 20 25
log2(Filesize (b))
Success r
ate
HTTP Verb
GET
PUT
DELETE
LIST
Streaming DepSky-A
0.994
0.996
0.998
1.000
0 5 10 15 20 25
log2(Filesize (b))
Success r
ate
HTTP Verb
GET
PUT
DELETE
LIST
Streaming DepSky-CA
AVAILABILITY
thanks!
Thesis available at:
http://www.cs.rug.nl/~aiellom/tesi/vdtil.pdf