foundations of privacy lecture 9: history-independent hashing schemes (and applications) lecturer:...
TRANSCRIPT
Foundations of Privacy
Lecture 9:History-Independent Hashing Schemes
(and applications)
Lecturer: Gil Segev
2
Election DayCarol
Bob
Carol
Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes
Alice Alice Bob
Alice Problem:
Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice …
AliceMay compromise the
privacy of the elections
3
Election Day
Carol
AliceBob
11
1
1
Carol Alice Alice Bob What about more involved applications?
Write-in candidates Votes which are subsets or rankings ….
A simple solution: Lexicographically sorted list of
candidates Unary counters
4
Learning From History
A simple example: sorted list Canonical memory representation Not really efficient...
The two levels of a data structure “Legitimate” interface Memory representation
History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface
AliceBob
Carol
5
This Talk
Part 1: An efficient history-independent hashing
scheme
Part 2: Application of history independence to
electronic voting
6
Part 1:
An Efficient History Independent
Hashing Scheme
7
HI Cuckoo HashingA HI dictionary that simultaneously achieves the following:
Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% (25% with deletions)
Strongest notion of history independence
Simple and fast
8
Notions of History Independence
Weak history independence - WHI Memory revealed at the end of an activity period Any two sequences of operations S1 and S2 that lead to the same
content induce the same distribution on the memory representation
Strong history independence -SHI Memory revealed several times during an activity period Any two sets of breakpoints along S1 and S2 with the same content
at each breakpoint, induce the same distribution on the memory representation at all these points
Completely randomizing memory after each operation is not good enough
Naor and Teague (2001) following Micciancio (1997)
9
Notions of History Independence
Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical
representation Provable efficiency gaps [BP06] (in restricted models)
We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures
[HHMPR05]
10
SHI DictionariesDeletions
Memory utilization
Update time
Lookup time
Practical?
Naor & Teague ‘01
Blelloch & Golovin ‘07
Blelloch & Golovin ‘07
This work
99%
99%
< 9%
< 25%(< 50%)
O(1) expected
O(1) expected
O(1) expected
O(1) expected
O(1) worst case
O(1) expected
O(1) worst case
O(1) worst case
?
(mem. util. < 50%)
(mem. util. < 50%)
11
Our Approach Cuckoo hashing [PR01]:
A simple & practical scheme with worst case constant lookup time
Force a canonical representation on cuckoo hashing No significant loss in efficiency
Avoid rehashing!! What happens when hash functions fail? Rehashing is problematic in SHI data structures
All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions
We use a secondary storage to reduces the failure probability exponentially [KMW08]
12
Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2
Store x in one of T1[h1(x)] and T2[h2(x)]
Insert(x): Greedily insert in T1 or T2
If both are occupied then store x in T1
Repeat in other table with the previous occupant
Y
Z
V
T1 T2
X
Z
Y
V
T1 T2
X
Successful insertion
W W
13
Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2
Store x in one of T1[h1(x)] and T2[h2(x)]
Y
U
Z
V
T1 T2
X
Failure –rehash
required
Insert(x): Greedily insert in T1 or T2
If both are occupied then store x in T1
Repeat in other table with the previous occupant
14
The Cuckoo Graph Set S ½ U containing n keys h1, h2 : U ! {1,...,r}
Bipartite graph with sets of size rEdge (h1(x), h2(x)) for every x2S
S is successfully stored
Every connected componenthas at most one cycle
Main theorem:
If r ¸ (1 + ²)n and h1,h2 are log(n)-wise independent,then failure probability is £(1/n)
15
The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph
Suffices to consider a single connected component
Assume that S forms a tree in the cuckoo graph. Typical case
One location must be empty. The choice of the empty location uniquely determines the location of all elements
a
b
d
c
eRule: h1 (minimal element) is empty
16
The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph
Suffices to consider a single connected component
Assume that S has one cycle Two ways to assign elements in the
cycle Each choice uniquely determines the
location of all elements
a
b
d
c
eRule: minimal element in cycle lies in T1
17
The Canonical Representation Updates efficiently maintain the canonical representation Insertions:
New leaf: check if new element is smaller than current min new cycle:
Same component… Merging two components…
All cases straight forward
Update time < size of component = expected (small) constant
Deletions: Find the new min, split component,… Requires connecting all elements in the component with a sorted cyclic list
Memory utilization drops to 25% All cases straight forward
18
Rehashing What if S cannot be stored using h1 and h2 ?
Happens with probability £(1/n)
Can we simply pick new functions? Rare, but very bad worst case performance “Canonical memory” implies we need to sample all hash
functions in advance (theoretical problem) Whenever an item is deleted, need to check whether we must
roll back to previous hash functions A bad item which is repeatedly inserted and deleted would cause
a rehash every operation!
19
Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary
data structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself
Theorem [KMW08]: Pr[|stash| > s] < n-s
In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability
In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed
whenever the content changes [AN96, HMP01]
20
Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme
Easily ‘molded’ into a history independent data structure
We don’t know how to do this for Cuckoo Hashing with more than 2 hash functions and/or more than 1 element per bucket Better memory utilization, better performance, but.. Expected size of connected component is not constant
Constant worst-case operations?
21
Part 2:
Application of History Independence
to Electronic Voting
Secure Vote Storage Mechanisms that operate in extremely hostile environments
Without a “secure” mechanism an adversary may be able to Undetectably tamper with the records Compromise privacy
Possible scenarios: Poll workers may tamper with the device while in transit Malicious software embeds secret information in public output …
22
Main Security Goals Tamper-evidence
Prevent an adversary from undetectably tampering with the records
History-independenceMemory representation does not reveal the insertion order
Subliminal-freenessInformation cannot be secretly embedded into the data
Integrity
Privacy
23
Secure Vote Storage
24
Goal:A secure and efficient mechanism for storing an increasingly growing set of K elements taken from a large universe of size N
Supports Insert(x), Seal() and RetreiveAll()Cast a ballot Count votes“Finalize” the
elections
25
Goal:A secure and efficient mechanism for storing an increasingly growing set of K elements taken from a large universe of size N
Tamper-evidence by exploiting write-once memories Information-theoretic security Everything is public!! No need for private storage
Deterministic strategy in which each subset of elements determines a unique memory representation
Strongest form of history-independence Unique representation - cannot secretly embed information
Our approach:
Initialized to all 0’sCan only flip 0’s to 1’s
Secure Vote Storage
26
Previous approaches were either: Inefficient (required O(K2) space) Randomized (enabled subliminal channels) Required private storage
Explicit
Space
Insertion time
Kpolylog(N)
polylog(N)
Klog(N/K)
log(N/K)
Non-Explicit
Deterministic, history-independent and write-oncestrategy for storing an increasingly growing set of Kelements taken from a large universe of size N
Our ResultsMain Result
Deterministic, history-independent and write-oncestrategy for storing an increasingly growing set of Kelements taken from a large universe of size N
Our ResultsMain Result
First explicit, deterministic and non-adaptive Conflict Resolution algorithm which is optimalup to poly-logarithmic factors
Application to Distributed Computing
Resolve conflicts in multiple-access channels One of the classical Distributed Computing problems Explicit, deterministic & non-adaptive -- open since ‘85 [Komlos &
Greenberg]27
Previous Work Molnar, Kohno, Sastry & Wagner ‘06
Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories
Initialized to all 0’sCan only flip 0’s to 1’s
Encoding(x) = (x, wt2(x))
Logarithmic overhead
PROM
Flipping any bit of x from 0 to 1requires flipping a bit of wt2(x) from 1 to 0
28
29
Previous Work Molnar, Kohno, Sastry & Wagner ‘06
Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories “Copy-over list”: A deterministic & history-independent solution
Problem: Cannot sort in-place on write-once
memories
On every insertion: Compute the sorted list including the new element Copy the sorted list to the next available memory position Erase the previous list
A useful observation:Store the elements in a sorted list
O(K2) space!!
Previous Work Molnar, Kohno, Sastry & Wagner ‘06
Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories “Copy-over list”: A deterministic & history-independent solution Several other solutions which are either randomized or require private storage
Bethencourt, Boneh & Waters ‘07 A linear-space cryptographic solution “History-independent append-only” signature scheme Randomized & requires private storage
30
Our Mechanism Global strategy
Mapping elements to entries of a table
Both strategies are deterministic, history-independent and write-once
Local strategy Resolving collisions separately in each entry
31
The Local Strategy Store elements mapped to each entry in a separate copy-over list
ℓ elements require ℓ2 pre-allocated memory Allows very small values of ℓ in the worst case!
Can a deterministic global strategy guarantee that?
The worst case behavior of any fixed hash function is very poor There is always a relatively large set of elements which are mapped
to the same entry….
32
The Global Strategy Sequence of tables Each table stores a fraction of the elements
Each element is inserted into several entries of the first table When an entry overflows:
o Elements that are not stored elsewhere are inserted into the next table
o The entry is permanently deleted
33
The Global Strategy Each element is inserted into several entries of the first table When an entry overflows:
o Elements that are not stored elsewhere are inserted into the next tableo The entry is permanently deleted
Universe of size N
OVERFLOW
OVERFLOW
34
The Global Strategy
OVERFLOW
Universe of size N
Each element is inserted into several entries of the first table When an entry overflows:
o Elements that are not stored elsewhere are inserted into the next tableo The entry is permanently deleted
35
Each element is inserted into several entries of the first table When an entry overflows:
o Elements that are not stored elsewhere are inserted into the next tableo The entry is permanently deleted
Universe of size N
Unique representation: Elements determine
overflowing entries in the first table
Elements mapped to non-overflowing entries are stored
Continue with the next table and remaining elements
The Global Strategy
36
Subset of size K
Table of size ~KStores ®K elements
Table of size ~(1-®)KStores ®(1 - ®)K elements
Table of size ~(1-®)2K
Where do the hash functions come from?
Universe of size N
Each element is inserted into several entries of the first table When an entry overflows:
o Elements that are not stored elsewhere are inserted into the next tableo The entry is permanently deleted
The Global Strategy
37
Identify the hash function of each table with a bipartite graph
Universe of size N
S
OVERFLOW
OVERFLOW
LOW DEGREE
The Global Strategy
(K, ®, ℓ)-Bounded-Neighbor Expander:Any set S of size K contains ®K elements with a neighbor of degree · ℓ w.r.t S
38
Bounded-Neighbor Expanders
Table of size M
Universe of size N
Given N and K, want to optimize M, ℓ, ® and the left-degree D
Optimal Extractor Disperser
1 polylog(N)
1/2
M
®
ℓ
1/2
K¢log(N/K)
K¢2(loglogN)2 K
1/polylog(
N)
O(1)
(K, ®, ℓ)-Bounded-Neighbor Expander:Any set S of size K contains ®K elements with a neighbor of degree · ℓ w.r.t S
log(N/K)D 2(loglogN)2 polylog(N)
39
Open Problems Non-amortized insertion time
In our scheme insertions may have a cascading effect Construct a scheme that has bounded worst case insertion time
Improved bounded-neighbor expanders
The monotone encoding problem Our non-constructive solution: Klog(N) log(N/K) bits Obvious lower bound: Klog(N/K) bits Find the minimal M such that subsets of size at most K taken
from [N] can be mapped into subsets of [M] while preserving inclusions
Alon & Hod ‘07: M = O(Klog(N/K))40
Conflict Resolution Problem: resolve conflicts that arise when several parties transmit
simultaneously over a single channel Goal: schedules retransmissions such that each of the conflicting parties
eventually transmits individually A party which successfully transmits halts Efficiency measure: number of steps it takes to resolve any K conflicts
among N parties An algorithm is non-adaptive if the choices of the parties in each step do
not depend on previous steps
41
Conflict Resolution Why require a deterministic algorithm?
Radio Frequency Identification (RFID) Many tags simultaneously read by a single reader
Inventory systems, product tracking,... Tags are highly constraint devices
Can they generate randomness?
42
43
The Algorithm Global strategy
Mapping parties to time intervals
Local strategy Resolving collisions separately in each interval
44
The Local Strategy Associate each party x 2 [N] with a codeword C(x) taken from a
superimposed code:Any codeword is not contained in the bit-wise or of any other ℓ-1 codewords
Resolves conflicts among any ℓ parties taken from [N]
Party x transmits at step i if and only if C(x)i = 1
O(ℓ2¢logN) steps using known explicit constructions
45
Sequence of phases identified with bounded-neighbor expanders Each phase contains several time slots The graphs define the active parties at each slot Resolve collisions in each slot using the local strategy
Universe of size N
The Global Strategy
Phase 1
Phase 2
Phase 3
46
Sequence of phases identified with bounded-neighbor expanders Each phase contains several time slots The graphs define the active parties at each slot Resolve collisions in each slot using the local strategy
Universe of size N
The Global Strategy
O(K¢polylog(N)) steps
OVERFLOW
OVERFLOW
SUCCESS
SUCCESSSUCCESS
47
Further Reading Moni Naor and Vanessa Teague
Anti-persistence: History-Independent Data Structures ACM Symposium on Theory of Computing (STOC), 2001.
Moni Naor, Gil Segev and Udi WiederHistory-Independent Cuckoo HashingInternational Colloquium on Automata, Languages and Programming (ICALP), 2008.
Tal Moran, Moni Naor and Gil SegevDeterministic History-Independent Strategies for Storing Information on Write-Once MemoriesTheory of Computing, 2009.