susan b. davidson u. penn sanjeev khanna u. penn tova milotel-aviv u. debmalya panigrahi mit sudeepa...

21
Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova Milo Tel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Upload: ada-walsh

Post on 15-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Susan B. Davidson U. Penn

Sanjeev Khanna U. Penn

Tova Milo Tel-Aviv U.

Debmalya Panigrahi MIT

Sudeepa Roy U. Penn

Provenance Views for Module Privacy

Page 2: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Data-oriented Workflows Must Be Secure

Discrete Secure

Ref. Tova Milo’s keynote, PODS 2011

2 Provenance Views for Module Privacy PODS 2011

Page 3: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy

Split Entries

Align Sequences

Functional Data Curate Annotations

Format-2

Format-1

Format-3

Construct Trees

In an execution of the workflow, data (values) appear on the edges

TGCCGTGTGGCTAAAT

CTGTGC

CTAAATGTCTGTGC…

GGCTAAATGTCTG

TGCCGTGTGGCGTC…

ATCCGTGTGGCT..

d1

d2d3

d4

d5

d6 d7

3

PODS 2011

Workflows

Vertices = Modules/Progra

ms

Edges = Dataflow

Page 4: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy4

Biologist’s workspace

Which sequences have been used to produce this tree?

How has this tree been generated? ?

s

Split Entries

Align Sequences

Functional Data Curate Annotations

Format

Format

Format

Construct Trees

t

PODS 2011

Need for Provenance

TGCCGTGTGGCTAAAT

CTGTGC

CTAAATGTCTGTGC…

GGCTAAATGTCTG

TGCCGTGTGGCGTC…

ATCCGTGTGGCT..

? ??• Enable sharing and

reuse• Ensure repeatability

and debugging

Page 5: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy5

Need for Provenance Need for Privacy

s

Split Entries

Align Sequences

Functional Data Curate Annotations

Format

Format

Format

Construct Trees

t

WorkflowOWNER

WorkflowUSER

How has this result been produced?

All data values

My data is sensitive!

My module is

proprietary!

The flow/structure should

not be revealed!

PODS 2011

…TGCC…ATGGCC

Page 6: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy6

Module Privacy

Module f takes input x, produces output y = f(x)

User should not be able to guess (x, f(x)) pairs with high probability (over any number of executions)

Output value f(x) is private, not the algorithm for fPODS 2011

Module f

x1x2 x3 x4

y1y2 y3

f(x1, x2, x3, x4) = <y1, y2, y3>

Page 7: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy7

Module Privacy: Motivation

Medical Record of patient P

x = x’ =

f(x) = Does P have AIDS?

Process Record

f = Check for AIDS

Check for Cancer

Create Report

report

Does P have cancer?

Patient P’s concern:Whether P hasAIDS should notbe inferred given his medical record

Module owner’s concern: No one should be able to simulate the module anduse it elsewhere

PODS 2011

Page 8: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy8

Module Privacy in a Workflow

Private Modules (no a priori knowledge to the user)o Module for AIDS detection

Public Modules (full knowledge to the user) Sorting, reformatting modules

PODS 2011

a7a6

m1

m2 m3

a1

a3

a2

a4

a5

Data Sharing

n modules are connected as DAG

Private module f, input x, f(x) should not be revealed

Page 9: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy9

Module Privacy with Secure View

Privacy Definition: L-diversity [MGKV’ 06] By hiding some input/output attributes, each x

has L different equivalent possibilities for f(x) Output view is called a ‘Secure-view’

Differential privacy? [Dwork’ 06, DMNS’ 06, …] (Usual) Random noise cannot be added

Scientific experiments must be repeatable Any f should always map any x to the same f(x)

PODS 2011

Page 10: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy10

A view: Projection of R on visible attributes

Privacy parameter Γ (eg. Γ = 2)

Γ-standalone-private View: every input x can be mapped to Γ different outputs by the “possible worlds”

Possible World: Relation that agrees with R on visible attributes (and respects the functional dependency)

y = (x1 x2)y = (x1 ≠

x2)

Standalone Module Privacy

x1 x2

y

Input OutputModule f

PODS 2011

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 1

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 1

Relation R for f

Functional dependency: x1, x2 y

Page 11: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy11

A view: Same as before

Γ-workflow-private view: privacy for each private module as before

Possible world: Relation that agrees with R on visible attributes (and respects ALL functional dependencies)

Workflow Module Privacy

Relation R hasn func. dependencies

a7a6

m1

m2 m3

a1

a3

a2

a4 a5

Workflow W 1. a1, a2 a3, a4, a5

2. a3, a4 a6

3. a4, a5 a7

PODS 2011

a1 a2 a3 a4 a5 a6 a7

0 0 0 0 1 1 00 1 1 1 0 0 11 0 1 1 0 0 11 1 1 0 1 1 1

Page 12: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy12

Secure-View Optimization Problem

Conflicting interests of Owner and User

Hiding each data/attribute has a cost

PODS 2011

User: Provenanc

e

Owner: Privacy

Secure-view problem: Minimize the sum of the cost of the hidden attributes while guaranteeing Γ-workflow-privacy of all private modules

Page 13: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy13

Let’s start with a Single Module

PODS 2011

PROBLEM-1

V (Visible attributes)

V is safe?

PROBLEM-2

V

V is safe? ORACLEA safe subset V*

with minimum cost

PROBLEM-1 Communication Complexity: (N), N = #rows in R

o R is given explicitly Computation Complexity: Co-NP-hard in k = #attributes of R

o R is given succinctly

PROBLEM-2 Communication Complexity: 2(k) oracle calls are needed

How hard is the secure-view problem for a standalone module?

Page 14: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy14

Any Upper Bound?

The trivial brute-force algorithm solves the problem in time O(2kN2) k = #attributes of R, N = #rows of R Can return ALL safe subsets: useful for the next

step

Not so bad: k is not too large for a single module A module is reused in many workflows Expert knowledge from the module designers can

be used to speed up the process

PODS 2011

Page 15: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy15

Moving on to General Workflows

Workflows have Arbitrary data sharing, arbitrary (DAG) connection Interactions between private and public modules

Trivial algorithms are not good Leads to running time = exponential in n

We use the (list of) standalone safe subsets for private modules

First consider: Workflows with all private modules

Two Steps:1. Show that, any combination of safe-subsets for standalone

privacy is also safe for workflow privacy (Composability)

2. Find the minimum cost safe subset for workflow (Optimization)PODS 2011

Page 16: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy16

Composability

Key idea: When a module m is placed in a workflow, and the same attribute subset V is hidden,

#possible worlds shrinks but not #possible outputs of the inputs

Proof involves showing existence of a possible world “All-private workflow” assumption is necessary

PODS 2011

Page 17: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy17

Optimally Combining Standalone Solutions

Any combination of safe subsets works

We want one with minimum cost

Solve the optimization problem for workflow given the list of options for each individual module

The simplest version (no data sharing) is NP-hard

In the paper: Approximation and matching hardness results of

different versions Bounded data sharing has better approximation ratio

PODS 2011

Page 18: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy18

Workflows with Public Modules Public modules are difficult to handle

Composability does not work

f1(x) = y

f2(y) = y Public

Private

Solution: Privatize some public modules Names of “privatized”

modules are not revealed Now composability works

Privatization has an additional cost

Worse approximation results

PODS 2011

Page 19: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy19

Related Work

Workflow privacy (mainly access control) Chebotko et. al. ’08, Gil et. al. ’07, ’10

Secure provenance Braun et. al. ’08, Hasan et. al. ’07, Lyle-Martin ’10

Privacy-preserving data mining Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04

Privacy in statistical databases Survey by Dwork ’08

PODS 2011

Page 20: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy20

Conclusion and Future Work This is a first step to handling module privacy

in a network of modules

Future Directions:1. Explore alternative notion of privacy/partial

background knowledge2. Explore alternative “privatization” techniques for

public modules3. Handle infinite/very large domains of attributes

PODS 2011

Page 21: Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

Provenance Views for Module Privacy21

Thank You.

Questions?

PODS 2011