foundations of privacy lecture 9: history-independent hashing schemes (and applications) lecturer:...

Foundations of Privacy

Lecture 9:History-Independent Hashing Schemes

(and applications)

Lecturer: Gil Segev

2

Election DayCarol

Bob

Carol

Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes

Alice Alice Bob

Alice Problem:

Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice …

AliceMay compromise the

privacy of the elections

3

Election Day

Carol

AliceBob

11

1

1

Carol Alice Alice Bob What about more involved applications?

Write-in candidates Votes which are subsets or rankings ….

A simple solution: Lexicographically sorted list of

candidates Unary counters

4

Learning From History

A simple example: sorted list Canonical memory representation Not really efficient...

The two levels of a data structure “Legitimate” interface Memory representation

History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface

AliceBob

Carol

5

This Talk

Part 1: An efficient history-independent hashing

scheme

Part 2: Application of history independence to

electronic voting

6

Part 1:

An Efficient History Independent

Hashing Scheme

7

HI Cuckoo HashingA HI dictionary that simultaneously achieves the following:

Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% (25% with deletions)

Strongest notion of history independence

Simple and fast

8

Notions of History Independence

Weak history independence - WHI Memory revealed at the end of an activity period Any two sequences of operations S1 and S2 that lead to the same

content induce the same distribution on the memory representation

Strong history independence -SHI Memory revealed several times during an activity period Any two sets of breakpoints along S1 and S2 with the same content

at each breakpoint, induce the same distribution on the memory representation at all these points

Completely randomizing memory after each operation is not good enough

Naor and Teague (2001) following Micciancio (1997)

9

Notions of History Independence

Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical

representation Provable efficiency gaps [BP06] (in restricted models)

We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures

[HHMPR05]

10

SHI DictionariesDeletions

Memory utilization

Update time

Lookup time

Practical?

Naor & Teague ‘01

Blelloch & Golovin ‘07

Blelloch & Golovin ‘07

This work

99%

99%

< 9%

< 25%(< 50%)

O(1) expected

O(1) expected

O(1) expected

O(1) expected

O(1) worst case

O(1) expected

O(1) worst case

O(1) worst case

?

(mem. util. < 50%)

(mem. util. < 50%)

11

Our Approach Cuckoo hashing [PR01]:

A simple & practical scheme with worst case constant lookup time

Force a canonical representation on cuckoo hashing No significant loss in efficiency

Avoid rehashing!! What happens when hash functions fail? Rehashing is problematic in SHI data structures

All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions

We use a secondary storage to reduces the failure probability exponentially [KMW08]

12

Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2

Store x in one of T1[h1(x)] and T2[h2(x)]

Insert(x): Greedily insert in T1 or T2

If both are occupied then store x in T1

Repeat in other table with the previous occupant

Y

Z

V

T1 T2

X

Z

Y

V

T1 T2

X

Successful insertion

W W

13

Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2

Store x in one of T1[h1(x)] and T2[h2(x)]

Y

U

Z

V

T1 T2

X

Failure –rehash

required

Insert(x): Greedily insert in T1 or T2

If both are occupied then store x in T1

Repeat in other table with the previous occupant

14

The Cuckoo Graph Set S ½ U containing n keys h1, h2 : U ! {1,...,r}

Bipartite graph with sets of size rEdge (h1(x), h2(x)) for every x2S

S is successfully stored

Every connected componenthas at most one cycle

Main theorem:

If r ¸ (1 + ²)n and h1,h2 are log(n)-wise independent,then failure probability is £(1/n)

15

The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph

Suffices to consider a single connected component

Assume that S forms a tree in the cuckoo graph. Typical case

One location must be empty. The choice of the empty location uniquely determines the location of all elements

a

b

d

c

eRule: h1 (minimal element) is empty

16

The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph

Suffices to consider a single connected component

Assume that S has one cycle Two ways to assign elements in the

cycle Each choice uniquely determines the

location of all elements

a

b

d

c

eRule: minimal element in cycle lies in T1

17

The Canonical Representation Updates efficiently maintain the canonical representation Insertions:

New leaf: check if new element is smaller than current min new cycle:

Same component… Merging two components…

All cases straight forward

Update time < size of component = expected (small) constant

Deletions: Find the new min, split component,… Requires connecting all elements in the component with a sorted cyclic list

Memory utilization drops to 25% All cases straight forward

18

Rehashing What if S cannot be stored using h1 and h2 ?

Happens with probability £(1/n)

Can we simply pick new functions? Rare, but very bad worst case performance “Canonical memory” implies we need to sample all hash

functions in advance (theoretical problem) Whenever an item is deleted, need to check whether we must

roll back to previous hash functions A bad item which is repeatedly inserted and deleted would cause

a rehash every operation!

19

Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary

data structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself

Theorem [KMW08]: Pr[|stash| > s] < n-s

In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability

In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed

whenever the content changes [AN96, HMP01]

20

Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme

Easily ‘molded’ into a history independent data structure

We don’t know how to do this for Cuckoo Hashing with more than 2 hash functions and/or more than 1 element per bucket Better memory utilization, better performance, but.. Expected size of connected component is not constant

Constant worst-case operations?

21

Part 2:

Application of History Independence

to Electronic Voting

Secure Vote Storage Mechanisms that operate in extremely hostile environments

Without a “secure” mechanism an adversary may be able to Undetectably tamper with the records Compromise privacy

Possible scenarios: Poll workers may tamper with the device while in transit Malicious software embeds secret information in public output …

22

Main Security Goals Tamper-evidence

Prevent an adversary from undetectably tampering with the records

History-independenceMemory representation does not reveal the insertion order

Subliminal-freenessInformation cannot be secretly embedded into the data

Integrity

Privacy

23

Secure Vote Storage

24

Goal:A secure and efficient mechanism for storing an increasingly growing set of K elements taken from a large universe of size N

Supports Insert(x), Seal() and RetreiveAll()Cast a ballot Count votes“Finalize” the

elections

25

Goal:A secure and efficient mechanism for storing an increasingly growing set of K elements taken from a large universe of size N

Tamper-evidence by exploiting write-once memories Information-theoretic security Everything is public!! No need for private storage

Deterministic strategy in which each subset of elements determines a unique memory representation

Strongest form of history-independence Unique representation - cannot secretly embed information

Our approach:

Initialized to all 0’sCan only flip 0’s to 1’s

Secure Vote Storage

26

Previous approaches were either: Inefficient (required O(K2) space) Randomized (enabled subliminal channels) Required private storage

Explicit

Space

Insertion time

Kpolylog(N)

polylog(N)

Klog(N/K)

log(N/K)

Non-Explicit

Deterministic, history-independent and write-oncestrategy for storing an increasingly growing set of Kelements taken from a large universe of size N

Our ResultsMain Result

Deterministic, history-independent and write-oncestrategy for storing an increasingly growing set of Kelements taken from a large universe of size N

Our ResultsMain Result

First explicit, deterministic and non-adaptive Conflict Resolution algorithm which is optimalup to poly-logarithmic factors

Application to Distributed Computing

Resolve conflicts in multiple-access channels One of the classical Distributed Computing problems Explicit, deterministic & non-adaptive -- open since ‘85 [Komlos &

Greenberg]27

Previous Work Molnar, Kohno, Sastry & Wagner ‘06

Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories

Initialized to all 0’sCan only flip 0’s to 1’s

Encoding(x) = (x, wt2(x))

Logarithmic overhead

PROM

Flipping any bit of x from 0 to 1requires flipping a bit of wt2(x) from 1 to 0

28

29


Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories “Copy-over list”: A deterministic & history-independent solution

Problem: Cannot sort in-place on write-once

memories

On every insertion: Compute the sorted list including the new element Copy the sorted list to the next available memory position Erase the previous list

A useful observation:Store the elements in a sorted list

O(K2) space!!


Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories “Copy-over list”: A deterministic & history-independent solution Several other solutions which are either randomized or require private storage

Bethencourt, Boneh & Waters ‘07 A linear-space cryptographic solution “History-independent append-only” signature scheme Randomized & requires private storage

30

Our Mechanism Global strategy

Mapping elements to entries of a table

Both strategies are deterministic, history-independent and write-once

Local strategy Resolving collisions separately in each entry

31

The Local Strategy Store elements mapped to each entry in a separate copy-over list

ℓ elements require ℓ2 pre-allocated memory Allows very small values of ℓ in the worst case!

Can a deterministic global strategy guarantee that?

The worst case behavior of any fixed hash function is very poor There is always a relatively large set of elements which are mapped

to the same entry….

32

The Global Strategy Sequence of tables Each table stores a fraction of the elements

Each element is inserted into several entries of the first table When an entry overflows:

o Elements that are not stored elsewhere are inserted into the next table

o The entry is permanently deleted

33

The Global Strategy Each element is inserted into several entries of the first table When an entry overflows:

o Elements that are not stored elsewhere are inserted into the next tableo The entry is permanently deleted

Universe of size N

OVERFLOW

OVERFLOW

34

The Global Strategy

OVERFLOW

Universe of size N



35



Universe of size N

Unique representation: Elements determine

overflowing entries in the first table

Elements mapped to non-overflowing entries are stored

Continue with the next table and remaining elements

The Global Strategy

36

Subset of size K

Table of size ~KStores ®K elements

Table of size ~(1-®)KStores ®(1 - ®)K elements

Table of size ~(1-®)2K

Where do the hash functions come from?

Universe of size N



The Global Strategy

37

Identify the hash function of each table with a bipartite graph

Universe of size N

S

OVERFLOW

OVERFLOW

LOW DEGREE

The Global Strategy

(K, ®, ℓ)-Bounded-Neighbor Expander:Any set S of size K contains ®K elements with a neighbor of degree · ℓ w.r.t S

38

Bounded-Neighbor Expanders

Table of size M

Universe of size N

Given N and K, want to optimize M, ℓ, ® and the left-degree D

Optimal Extractor Disperser

1 polylog(N)

1/2

M

®

ℓ

1/2

K¢log(N/K)

K¢2(loglogN)2 K

1/polylog(

N)

O(1)

(K, ®, ℓ)-Bounded-Neighbor Expander:Any set S of size K contains ®K elements with a neighbor of degree · ℓ w.r.t S

log(N/K)D 2(loglogN)2 polylog(N)

39

Open Problems Non-amortized insertion time

In our scheme insertions may have a cascading effect Construct a scheme that has bounded worst case insertion time

Improved bounded-neighbor expanders

The monotone encoding problem Our non-constructive solution: Klog(N) log(N/K) bits Obvious lower bound: Klog(N/K) bits Find the minimal M such that subsets of size at most K taken

from [N] can be mapped into subsets of [M] while preserving inclusions

Alon & Hod ‘07: M = O(Klog(N/K))40

Conflict Resolution Problem: resolve conflicts that arise when several parties transmit

simultaneously over a single channel Goal: schedules retransmissions such that each of the conflicting parties

eventually transmits individually A party which successfully transmits halts Efficiency measure: number of steps it takes to resolve any K conflicts

among N parties An algorithm is non-adaptive if the choices of the parties in each step do

not depend on previous steps

41

Conflict Resolution Why require a deterministic algorithm?

Radio Frequency Identification (RFID) Many tags simultaneously read by a single reader

Inventory systems, product tracking,... Tags are highly constraint devices

Can they generate randomness?

42

43

The Algorithm Global strategy

Mapping parties to time intervals

Local strategy Resolving collisions separately in each interval

44

The Local Strategy Associate each party x 2 [N] with a codeword C(x) taken from a

superimposed code:Any codeword is not contained in the bit-wise or of any other ℓ-1 codewords

Resolves conflicts among any ℓ parties taken from [N]

Party x transmits at step i if and only if C(x)i = 1

O(ℓ2¢logN) steps using known explicit constructions

45

Sequence of phases identified with bounded-neighbor expanders Each phase contains several time slots The graphs define the active parties at each slot Resolve collisions in each slot using the local strategy

Universe of size N

The Global Strategy

Phase 1

Phase 2

Phase 3

46

Sequence of phases identified with bounded-neighbor expanders Each phase contains several time slots The graphs define the active parties at each slot Resolve collisions in each slot using the local strategy

Universe of size N

The Global Strategy

O(K¢polylog(N)) steps

OVERFLOW

OVERFLOW

SUCCESS

SUCCESSSUCCESS

47

Further Reading Moni Naor and Vanessa Teague

Anti-persistence: History-Independent Data Structures ACM Symposium on Theory of Computing (STOC), 2001.

Moni Naor, Gil Segev and Udi WiederHistory-Independent Cuckoo HashingInternational Colloquium on Automata, Languages and Programming (ICALP), 2008.

Tal Moran, Moni Naor and Gil SegevDeterministic History-Independent Strategies for Storing Information on Write-Once MemoriesTheory of Computing, 2009.

foundations of privacy lecture 9: history-independent hashing schemes (and applications) lecturer:...

Documents