Download - Making Cloud Storage Provenance-Aware
Making Cloud Storage Provenance-Making Cloud Storage Provenance-AwareAware
Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer
Harvard School of Engineering and Applied
Sciences
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 22
The CloudThe Cloud
Next generation computing environmentNext generation computing environment Cheap: Pay as you goCheap: Pay as you go Provision resources (storage, CPU) on a need basisProvision resources (storage, CPU) on a need basis
Provides illusion of infinite resourcesProvides illusion of infinite resources Companies with large batch oriented tasks can get results Companies with large batch oriented tasks can get results
quicklyquickly
Cloud providersCloud providers Amazon Web Services (AWS)Amazon Web Services (AWS) Google AppEngineGoogle AppEngine Microsoft AzureMicrosoft Azure
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 33
Provenance for the CloudProvenance for the Cloud
As apps move to the cloud, so will the dataAs apps move to the cloud, so will the data Amazon hosts scientific data for freeAmazon hosts scientific data for free
However, most cloud services are not However, most cloud services are not designed to store provenancedesigned to store provenance
Why Provenance?Why Provenance? Debug Application ResultsDebug Application Results Validate Data SetsValidate Data Sets Improve Search ResultsImprove Search Results Regulatory ComplianceRegulatory Compliance
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 44
Provenance PropertiesProvenance Properties
We identified the following propertiesWe identified the following properties Read CorrectnessRead Correctness Causal Ancestry OrderingCausal Ancestry Ordering QueryableQueryable
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 55
Read CorrectnessRead Correctness
Data must be what is described by provenanceData must be what is described by provenance Provenance accurately describes the data objectProvenance accurately describes the data object MechanismsMechanisms
AtomicityAtomicity: At storage time, both provenance and data : At storage time, both provenance and data should be stored or neither should be storedshould be stored or neither should be stored
ConsistencyConsistency: At retrieval time, data returned should : At retrieval time, data returned should be consistent with provenancebe consistent with provenance
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 66
Causal Ancestry OrderingCausal Ancestry Ordering
The provenance and data of an ancestor object must be recorded in the provenance systemNo dangling references
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 77
Efficient QueryEfficient Query
Provenance must be accessible to users who want to verify properties of their data or simply be aware of its lineage If provenance is not readily accessible, the
provenance is of questionable value.
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 88
GoalGoal
How do we design protocols around How do we design protocols around current cloud services such that these current cloud services such that these properties are satisfied?properties are satisfied?
SettingSetting Provenance-Aware Storage system (PASS) Provenance-Aware Storage system (PASS)
tracks and collects provenancetracks and collects provenance Primarily considered AWSPrimarily considered AWS
Used 3 services: S3, SimpleDB, SQSUsed 3 services: S3, SimpleDB, SQS
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 99
OutlineOutline
IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1010
Observes system calls that applications make and captures relationships between objects P: read A
Generates record: P depends on A Cache the record
P: write B Generates record: B depends on P Store both ‘B depends on P’ and ‘P depends on A’
Mirrors data locally and caches provenance till we need to send it to AWS
Provenance-Aware Storage SystemProvenance-Aware Storage System
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1111
OutlineOutline
IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1212
Simple Storage Service (S3)Simple Storage Service (S3)
Object Store: sizes from 1byte to 5GBObject Store: sizes from 1byte to 5GB Object’s identified by URIObject’s identified by URI SOAP or REST interfaceSOAP or REST interface Operations: Operations:
PUT, GET, HEAD, COPY, DELETEPUT, GET, HEAD, COPY, DELETE PUT: store an object and its metadata (2KB limit)PUT: store an object and its metadata (2KB limit) HEAD: retrieves metadata of an objectHEAD: retrieves metadata of an object
Cost: data storage + bandwidth + num opsCost: data storage + bandwidth + num ops Eventual consistencyEventual consistency
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1313
Architecture 1: Standalone S3Architecture 1: Standalone S3
ApplicationApplication
PASSPASS
S3S3
Prov+DataProv+Data
UserUserSystemSystem
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1414
Protocol 1: Standalone S3Protocol 1: Standalone S3
PASSPASS
S3S3
PUT:(Prov >1KB)
PUT:(Prov >1KB)
OKOK
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1515
Protocol 1: Standalone S3Protocol 1: Standalone S3
PASSPASS
S3S3PUT:D
ata
PUT:Data
OKOK
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1616
PropertiesProperties
ArchArch Read Read
CorrectnessCorrectness
Causal Causal OrderingOrdering
EfficientEfficient
QueryQuery
AtomicityAtomicity ConsistencyConsistency
S3S3
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1717
OutlineOutline
IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1818
SimpleDBSimpleDB
Service providing database functionalityService providing database functionality Data model: items described by attribute-value Data model: items described by attribute-value
pairspairs 256 attrs maximum, name/value < 1KB256 attrs maximum, name/value < 1KB Operations: PutAttributes, Query, Operations: PutAttributes, Query,
QueryWithAttributes, and SELECTQueryWithAttributes, and SELECT Query returns itemsQuery returns items QueryWithAttributes returns both items and attributesQueryWithAttributes returns both items and attributes
Cost: bandwidth + storage + num ops + Cost: bandwidth + storage + num ops + machine hrsmachine hrs
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 1919
Architecture 2: S3 + SimpleDBArchitecture 2: S3 + SimpleDB
ApplicationApplication
PASSPASS
S3S3
UserUserSystemSystem
SimpleDBSimpleDB
DataData ProvProv
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2020
Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB
PASSPASS
S3S3
PUT:(rec > 1KB)
PUT:(rec > 1KB)
OKOK
SimpleDBSimpleDB
PutAttrs+PutAttrs+
OKOK
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2121
Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB
PASSPASS
S3S3PUT:D
ata
PUT:Data
OKOK
SimpleDBSimpleDB
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2222
PropertiesProperties
ArchArch Read Read
CorrectnessCorrectness
Causal Causal OrderingOrdering
EfficientEfficient
QueryQuery
AtomicityAtomicity ConsistencyConsistency
S3S3
SimpleDBSimpleDB
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2323
OutlineOutline
IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2424
Simple Queuing Service (SQS)Simple Queuing Service (SQS)
Distributed Messaging SystemDistributed Messaging System Queues are identified by URLQueues are identified by URL Operations: SendMessage, Operations: SendMessage,
ReceiveMessage, DeleteMessageReceiveMessage, DeleteMessage VisibilityTimeout:VisibilityTimeout:
Message will not be available for x seconds Message will not be available for x seconds after a ReceiveMessageafter a ReceiveMessage
Limits: 8KB message size, max 10 msgs can be Limits: 8KB message size, max 10 msgs can be receivedreceived
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2525
Architecture 3: S3 + SimpleDB + SQSArchitecture 3: S3 + SimpleDB + SQS
ApplicationApplication
PASSPASS
S3S3
UserUserSystemSystem
SimpleDBSimpleDB
Queue1Queue1
DataData ProvProv
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2626
Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS
PASSPASS
SimpleDBSimpleDBSQSSQS
CommitdCommitd
S3S3PUT: Temp copyPUT: Temp copy
OKOK
SndMsg+SndMsg+ OKOK
RecvMsg+RecvMsg+
COPYCOPY
OKOK
OKOK
PutAttrs+
PutAttrs+
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2727
Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS
PASSPASS
SimpleDBSimpleDBSQSSQS
CommitdCommitd
S3S3
DelMsg+DelMsg+
DEL:CPY
DEL:CPY
OKOK
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2828
IdempotencyIdempotency
SimpleDB, S3, and SQS are idempotentSimpleDB, S3, and SQS are idempotent If a commit daemon crashes, comes back If a commit daemon crashes, comes back
up and processes a transaction again, up and processes a transaction again, there will not be errorsthere will not be errors
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 2929
PropertiesProperties
ArchArch Read Read
CorrectnessCorrectness
Causal Causal OrderingOrdering
EfficientEfficient
QueryQuery
AtomicityAtomicity ConsistencyConsistency
S3S3
SimpleDBSimpleDB
SQSSQS
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3030
OutlineOutline
IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3131
AnalysisAnalysis
Extracted provenance by running three Extracted provenance by running three workloadsworkloads Linux compileLinux compile BlastBlast Provenance challengeProvenance challenge
Compute cost to store and query Compute cost to store and query provenanceprovenance Number of opsNumber of ops BandwidthBandwidth
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3232
Storage CostStorage Cost
Raw P1 P2 P3
Data 1.27GB 121.8MB
(9.3%)
167.8MB
(13.6%)
421.4MB
(32.2%)
ops 31,180 24,952
(80.0%)
168,514
(540.5%)
231,287
(741.7%)
P1 = S3P1 = S3
P2 = S3 + SimpleDBP2 = S3 + SimpleDB
P3 = S3 + SimpleDB + SQSP3 = S3 + SimpleDB + SQS
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3333
Query CostQuery Cost
1. Dump the provenance of a given object Ran it on all objects for statistical
significance
2. Find all the files that were outputs of blast.
3. Find all the descendants of files derived from blast.
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3434
Query resultsQuery results
Query S3 SimpleDB
Data OpsOps DataData OpsOps
11 121.8MB 56,132 51.24MB 71,825
22 121.8MB 56,132 2.8KB 6
33 121.8MB 56,132 13.8KB 31
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3535
OutlineOutline
IntroductionIntroduction PASS BackgroundPASS Background Protocol 1: Standalone S3Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQSProtocol 3: S3 + SimpleDB + SQS AnalysisAnalysis Conclusion and StatusConclusion and Status
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3636
ConclusionsConclusions
Identified the properties that need to be Identified the properties that need to be satisfied for storing provenance in the satisfied for storing provenance in the cloudcloud
Presented various protocols for storing Presented various protocols for storing provenance and data on the cloudprovenance and data on the cloud
Costs of storing provenance is reasonableCosts of storing provenance is reasonable
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3737
StatusStatus
System almost readySystem almost ready Plan to submit it to Symposium on Operating Plan to submit it to Symposium on Operating
Systems PrinciplesSystems Principles ( (SOSP)SOSP) Really hard to drive up the costReally hard to drive up the cost
Jan Bill = $1.95Jan Bill = $1.95 Feb Bill = $9.38Feb Bill = $9.38
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3838
ExtraExtra
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 3939
Protocol 1: Standalone S3Protocol 1: Standalone S3
On file close:On file close: Convert the provenance into attribute-Convert the provenance into attribute-
value pairs as required by S3 value pairs as required by S3 If (sizeof(record) > 1KB) If (sizeof(record) > 1KB)
Store the record in a separate S3 object Store the record in a separate S3 object Replace attribute-value pair with pointer to Replace attribute-value pair with pointer to
this object this object Upload the file using PUT:Upload the file using PUT:
Arguments: object, attributesArguments: object, attributes
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 4040
Protocol 2: S3 + SimpleDBProtocol 2: S3 + SimpleDB
On file close:On file close:1.1. Convert the provenance into attribute-value pairs as Convert the provenance into attribute-value pairs as
required by SimpleDBrequired by SimpleDB Additonal record: md5sum of (file contents + version)Additonal record: md5sum of (file contents + version)
2.2. If (sizeof(record) > 1KB)If (sizeof(record) > 1KB) Store the record in a separate S3 objectStore the record in a separate S3 object Replace attribute-value pair with pointer to this objectReplace attribute-value pair with pointer to this object
3.3. Issue PutAttributes: store the provenanceIssue PutAttributes: store the provenance One item per version (= One PutAttributes) per version of the One item per version (= One PutAttributes) per version of the
objectobject
4.4. Upload the file to S3 using PUTUpload the file to S3 using PUT
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 4141
Protocol 3: S3 + SimpleDB + SQS (I)Protocol 3: S3 + SimpleDB + SQS (I)
Log phase: Log data on a queueLog phase: Log data on a queue1.1. Store a copy of the file in a temporary Store a copy of the file in a temporary
location on S3location on S3
2.2. Allocate a transaction id (uuid)Allocate a transaction id (uuid)
3.3. Split provenance into chunks of 8KB and Split provenance into chunks of 8KB and enqueue them on an SQS queueenqueue them on an SQS queue
Tag each message with the transaction IDTag each message with the transaction ID One additional record that has a pointer to the One additional record that has a pointer to the
temp S3 objecttemp S3 object
2/23/20092/23/2009 Making a cloud Provenance-Aware - TaPP'09Making a cloud Provenance-Aware - TaPP'09 4242
Protocol 3: S3 + SimpleDB + SQS (II)Protocol 3: S3 + SimpleDB + SQS (II)
Commit phase: move data from SQS to S3 and Commit phase: move data from SQS to S3 and SimpleDBSimpleDB
1.1. ReceiveMessage: get messages from the queue and ReceiveMessage: get messages from the queue and assemble the packetsassemble the packets
2.2. Store the provenance in SimpleDB using Store the provenance in SimpleDB using PutAttributes callPutAttributes call
Take care of overflowsTake care of overflows
3.3. Execute an S3 COPY and copy the object from its Execute an S3 COPY and copy the object from its temporary location to permanenttemporary location to permanent
4.4. Delete Messages from SQSDelete Messages from SQS5.5. Delete temporary file copyDelete temporary file copy