provenance for the cloud (usenix conference on file and storage technologies(fast `10))

33
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences 1

Upload: shelley

Post on 22-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10)). Kiran -Kumar Muniswamy -Reddy, Peter Macko , and Margo Seltzer Harvard School of Engineering and Applied Sciences. Outline. Introduction Background Provenance System Property Architecture & Protocol - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

1

PROVENANCE FOR THE CLOUD(USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10))Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo SeltzerHarvard School of Engineering and Applied Sciences

Page 2: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

2

Outline Introduction Background Provenance System Property Architecture & Protocol Evaluation Conclusion & Comment

Page 3: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

3

Introduction Problem to Solve

Implement a provenance aware storage system in current cloud stores ( use Amazon )

Page 4: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

4

Background(1/3) Provenance

Data has two critical components What it is ( contents ) Where it came from ( ancestry )

The provenance is the description of how the object was derived.

The metadata that describes the history of an object Why use provenance?

Use case – Slogan Digital Sky Survey (SDSS) Debug Experimental Results Detect and Avoid Faulty Data Propagation Improving Text Search Result

Security

Page 5: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

5

Page 6: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

6

Background(2/3) Provenance can be abstract defined as a

directed acyclic graph ( DAG ) Nodes

objects : files, processes, tuples, data sets, etc Have attributes

Command line arguments Name and Version number

Edges Indicate a dependency between the objects

Page 7: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

7

Justification Report

is justified by

is response to

is caused by

is caused by

is response to

is response to

is based on

is based on

is based on

is caused by

Data Collection Request

I1

Blood Test Request

I2

Donor Data Request

I4Donation DecisionI9

Blood Test Request

I6

Decision Request

I8

Blood Test Result

I7

Donor Data

I5

Patient Brain Death Notification

I3

Page 8: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

8

Background(3/3) Eventual Consistency

A weaker form of data consistency During a sufficient long period of time, and

no updates are sent, we can expect that all replicas in system will be consistent

Page 9: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

9

Provenance System Property(1/2)

Provenance Data Coupling An object and its provenance must match The provenance must accurately and

completely describe the data Multi-object Causal Ordering

The causal relationship among objects A system must ensure that an object’s

ancestors and their provenance are persistent before making the object itself persistent

Page 10: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

10

Justification Report

is justified by

is response to

is caused by

is caused by

is response to

is response to

is based on

is based on

is based on

is caused by

Data Collection Request

I1

Blood Test Request

I2

Donor Data Request

I4Donation DecisionI9

Blood Test Request

I6

Decision Request

I8

Blood Test Result

I7

Donor Data

I5

Patient Brain Death Notification

I3

Page 11: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

11

Provenance System Property(2/2) Data Independent Persistence

Ensure a system retain an object’s provenance, even if the object is removed

Efficient Query Be accessible to users who want to access

or verify provenance properties of their data

Page 12: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

12

Architecture(1)

Page 13: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

13

Architecture(2) – S3 Simple Storage Service(S3)

Amazon’s storage service An object store where the size of objects

can range from 1 byte to 5GB With each objects, clients can store up to

2KB of metadata Use SOAP or REST API

PUT, GET, HEAD, COPY, DELETE

Page 14: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

14

Architecture(3) - SimpleDB SimpleDB

An Amazon’s service that provides the functionality of indexing and querying data

Data model consist items that are described by <attribute,value> pairs

Each item can have 256 <attribute,value> pairs

Each attribute name and value can be as large as 1KB

Page 15: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

15

Architecture(4) - SQS Simple Queueing Service

Distributed messaging system that allows users to exchange messages between various distributed components in their systems

8KB limit of the size of the message In this paper, SQS is used as a write-ahead

log(WAL)

Page 16: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

16

Architecture(5) -- PASS Provenance-Aware Storage System

A storage system that automatically collects , stores., manages, and provides search for provenance

Monitor system calls Generate provenance and sending both

provenance and data to PA-S3fs

Page 17: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

17

Architecture(6) – PA-S3fs Provenance Aware S3 File System

Caches data and provenance on the client to reduce traffic to S3

Send data and provenance to the cloud

Page 18: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

18

Protocol(1)

Page 19: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

19

Protocol(2) Protocol 1 ( P1 )

Standalone Cloud Store Map each file to an S3 object and store the

provenance as a separate S3 object Provenance object

Named with a uuid Contain the name of primary object

Primary object metadata Version number and uuid

Page 20: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

20

Protocol(3) P1 does not support

data coupling But can detect

decoupling Query is inefficient

Need retrieve all provenance

Client

PUT:Provenance

OKPUT:Data

OK

S3

Page 21: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

21

Protocol(4)

Page 22: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

22

Protocol(5) Protocol 2 ( P2 )

Cloud store with a cloud database Store provenance as one SimpleDB item

If item is larger than 1KB SimpleDB limit store provenance as S3 object save the pointer in attribute-value

Page 23: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

23

Protocol(6) Provide efficient

provenance queries Does not support

data coupling

Client

PUT: Prov > 1KB

OK

PUT:Data

OK

S3

SimpleDB

OK

BatchPUTAttributes: Prov

Page 24: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

24

Protocol(7) Protocol 3 ( P3 )

Cloud store with Cloud Database and Messaging Service

Use SQS as a write-ahead log (WAL) 8KB limit Store large objects as temporary S3 objects , and

record the pointer in WAL Commit daemon

Read the log records Assemble all the records belonging to a transaction Ignore the records if the client crash

Page 25: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

25

ClientPUT: Temp data copy

OK

Copy:Data

OK

S3

SimpleDB

OK

BatchPUTAttributes

SQSSendMessage: Prov

OK

CommitdRecvMess

age

S3

S3PUT:Prov>1

KB

Delete:temp

Delete:Msg

OK

OK OK

Page 26: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

26

Protocol(9)

Page 27: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

27

Evaluation(1) Workload

CVSROOT nightly backup IO intensive 240 operations

Blast Mix of compute and IO operations Provenance tree has a depth of 5 10773 operations

Challenge Mix of compute and IO operations Provenance tree has a depth of 11 6179 operations

Page 28: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

28

Evaluation(2)EC2 instance

Local machine

Page 29: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

29

Evaluation(3) Query performance

Q1 Retrieve all the provenance ever recorded

Q2 Retrieve the provenance of all version of one

object Q3

Find all files that were directly output by Blast Q4

Find all the descendants of files derived from Blast

Page 30: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

30

Evaluation(4)

Page 31: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

31

Conclusion Definition of properties that provenance

systems must exhibit Design and implementation of three

protocols for storing provenance and data on the cloud

All three protocols have reasonable overhead in time and minimal financial overhead

Page 32: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

32

Comment Economy

Provenance can not increase profit directly Customer loyalty

Security Provenance can ensure correctness of files But it may contain sensitive information

Page 33: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

33

THE END