operational significance and robustness in quantitative
TRANSCRIPT
ETAPS Unifying Invited Talk, 9 April 20141
Operational Significanceand Robustness inQuantitative Information Flow
Geoffrey SmithFlorida International University
Leakage of confidential information (13 March)
2
Motivating example: Password checker
n Check whether guess is equal to password:
n Secret input: passwordn Observable output:
n resultn running time?
n (If so, the length of the correct prefix is also leaked!)3
result = true;for (i=0; i < N; i++) { if (password[i] != guess[i]) { result = false; break; }}
Server
n Crowd members wish to communicate anonymously with a server.
n The initiator first sends the message to a randomly-chosen forwarder (possibly itself).
n Each forwarder forwards it again with probability pf, or sends it to the server with probability 1-pf.
n But some crowd members are collaborators that report who sends them a message.
n Secret input: identity of initiatorn Observable output: first sender of a
message to a collaborator (or no one)
Motivating example: Crowds protocol[ReiterRubin98]
4
i
5
Motivating example: Statistical database query
n Secret input: database of confidential entriesn Observable output: result of statistical query
(possibly with noise added)
Motivating example: Timing attack on cryptography [BonehBrumley03]
n 1024-bit RSA private key recovered in 2 hours from standard OpenSSL implementation across LAN.
n Secret input: RSA private keyn Observable output: approximate timings of
decryptions of a sequence of nonces
6
SSL Handshake (simplified)
Alert/OK
Enc(pk, nonce)
7
Quantitative information flow
n Noninterference [Cohen77, GoguenMeseguer82] requires the observable output to be independent of the secret input.
n But (as seen in the examples) this is often infeasible.
n Quantitative information flow aims to provide theory justifying the intuition that some leaks are “small”.
n An active area of research for more than a decade [ClarkHuntMalacaria02].
Plan of the talk
n Motivationn Information-theoretic channels and leakage
n Shannon leakagen Vulnerability and min-entropy leakage
n Operational significancen Gain functions and g-leakage
n Robustnessn Capacityn Robust channel ordering
n Conclusion
8
Information-theoretic channelsn The earlier examples can all be modeled as channels.n Finite sets X (secret inputs), Y (observable outputs).
n The choice of Y is subtle, and crucial!
n On input X, the channel probabilistically outputs Y.n Channel matrix C gives the conditional probabilities p(y|x):
n The rows of C sum to 1.n C is deterministic if each row contains exactly one 1.
9
y1 y2 y3 y4
x1
x2
x3
1 0 0 0
0 1/4 1/2 1/4
1/2 1/3 1/6 0
Prior and posterior distributionsn We assume that X has some prior distribution π.n We assume that adversary A knows both π and C.
n Seeing output y, A can inspect column y of C to update prior π to posterior distribution pX|y.
n For example, seeing output y4 reveals that X was x2.n Leakage depends on the way C maps prior distribution π
to a set of posterior distributions.10
y1 y2 y3 y4
x1
x2
x3
1 0 0 0
0 1/4 1/2 1/4
1/2 1/3 1/6 0
Example
π
1/4
1/2
1/4
Prior
y1 y2 y3 y4
x1
x2
x3
1 0 0 0
0 1/4 1/2 1/4
1/2 1/3 1/6 0
Channel matrix
y1 y2 y3 y4
x1
x2
x3
1/4 0 0 00 1/8 1/4 1/8
1/8 1/12 1/24 0
Joint matrix
pX|y1 pX|y2 pX|y3 pX|y4
x1
x2
x3
2/3 0 0 0
0 3/5 6/7 1
1/3 2/5 1/7 0
Posterior distributions
pY 3/8 5/24 7/24 1/8
Distribution on Y
Hyper-distribution on X
3/8 5/24 7/24 1/8
x1
x2
x3
2/3 0 0 0
0 3/5 6/7 1
1/3 2/5 1/7 0 11
Abstractly, a channel is a mappingfrom priors to hyper-distributions.[McIverMeinickeMorgan10]
Quantifying leakage using Shannon entropy
n A classic measure of the “uncertainty” of a probability distribution is Shannon entropy [1948].
n H[π] = -Σx π[x] log π[x] “prior entropy”n H[π,C] = ∑y p(y) H[pX|y] “posterior entropy”n Notice that H[π,C] is just the (weighted) average
entropy in the hyper-distribution.n Shannon leakage = H[π] - H[π,C]
n This is mutual information, usually denoted by I(X;Y).
n But a key question for any leakage measure is its operational significance.
12
n Huffman code for X, when π = (1/2, 1/4, 1/8, 1/8):
n Average length = (1/2)1 + (1/4)2 + (1/4)3 = 7/4 = H[π]n Shannon’s source coding theorem: H[π] is the average
number of bits required to represent X.n But for confidentiality, we might worry more about X’s
vulnerability to be guessed correctly by adversary A.n If π = (1/2, 2-1000, 2-1000, ..., 2-1000), then H[π] = 500.5 bits,
but half the time A can guess X correctly in just one try!
Operational significance of Shannon entropy
13
0
0
0
1
1
1
x1
x2
x3 x4
x1 : 0x2 : 10x3 : 110x4 : 111
n [Smith09] proposed to measure leakage based on X’s vulnerability to be guessed by A in one try.
n V[π] = maxx π[x] “prior vulnerability”n V[π,C] = ∑y p(y) V[pX|y] “posterior vulnerability”
n V[π,C] is the average vulnerability in the hyper-distribution.n V[π,C] is the complement of the Bayes risk.
n Min-entropy leakage:
L(π,C) = log
n Note: -log V[π] is Rényi’s min-entropy H∞[π].
Vulnerability and min-entropy leakage
14
V[π,C] V[π]
Leaking i bits means increasing the vulnerability by a factor of 2i.
15
Channel capacity
n For any leakage measure, it is useful to consider the maximum leakage over all priors π.
n Definition: Shannon capacity SC(C) = supπ I(X;Y)
n Definition: Min-capacity ML(C) = supπ L(π,C)
n Theorem: ML(C) is the log of the sum of the column maximums of C. It is realized on a uniform prior π.n Corollary: ML(C) = 0 iff the rows of C are identical.
(i.e. C satisfies noninterference)
n Theorem: For deterministic C, ML(C) = SC(C).
16
Channels in Cascade AB
n Formed by multiplying two channel matrices, or by factoring a channel matrix.
n Theorem [EspinozaSmith11]: L(π,AB) ≤ L(π,A)n Analogue of the classic data-processing inequality for
mutual information.
n Theorem: ML(AB) ≤ min { ML(A), ML(B) }
C2X Y ZA B
Repeated independent runs C(n)
n Theorem [KöpfSmith10]: ML(C(n)) ≤ |Y| log(n+1).n Application to timing attacks on blinded cryptography:
n Blinding randomizes the ciphertext before decryption, and de-randomizes after decryption.
n As a result, the n-observation timing attack is a repeated independent runs channel C(n).
n Hence the min-capacity grows only logarithmically in n.17
XYC
YC
C Y⋮
(Useful only whenC is probabilistic!)
Plan of the talk
n Motivationn Information-theoretic channels and leakage
n Shannon leakagen Vulnerability and min-entropy leakage
n Operational significancen Gain functions and g-leakage
n Robustnessn Capacityn Robust channel ordering
n Conclusion
18
Limitations of min-entropy leakage
n Min-entropy leakage implicitly assumes an operational scenario where adversary A benefits only by guessing X exactly, and in one try.
n But many other scenarios are possible:n Maybe A can benefit by guessing X partially or
approximately.n Maybe A is allowed to make multiple guesses.
n Maybe A is penalized for making a wrong guess.
n How can any single leakage measure be appropriate in all scenarios?
19
Gain functions and g-leakage [AlvimChatzikokolakisPalamidessiSmith12]
n We generalize min-entropy leakage by introducing gain functions to model the operational scenario.
n In any scenario, there is a finite set W of guesses that A can make about the secret.
n For each guess w and secret value x, there is a gain g(w,x) that A gets by choosing w when the secret’s actual value is x.
n Definition: gain function g : W × X → [0, 1]
n Example: Min-entropy leakage implicitly usesgid(w,x) =
20
1, if w = x0, otherwise{
g-vulnerability and g-leakage
n Definition: Prior g-vulnerability:Vg[π] = maxw ∑x π[x]g(w,x)“A’s maximum expected gain, over all possible guesses.”
n Posterior g-vulnerability: Vg[π,C] = ∑y p(y) Vg[pX|y]
n g-leakage: Lg(π,C) = log
n g-capacity: MLg(C) = supπ Lg(π,C)21
Vg[π,C] Vg[π]
♀
♂
Ann
Sue
Paul
Bob
Tom
Lab location:
N 39 .95185
W 75 .18749PassWord
Dictionary:
superman
apple-juice
johnsmith62
secret.flag
history123
...
The power of gain functions
22
SW1
4 SE1
4
NW1
4 NE1
4
CC
d d
d d
Guessing a secret approximately.g(w,x) = 1 − dist(w,x)
Guessing a property of a secret.g(w,x) = Is x of gender w?
Guessing a part of a secret.g(w, x) = Does w match the high-order bits of x?
Guessing a secret in 3 tries.g3(w, x) = Is x an element of set w of size 3?
Distinguishing channels with gain functions
n Two channels on a uniformly distributed, 64-bit X:A. Y = X | 0x7;B. if (X % 8 == 0) Y = X; else Y = 1;n A always leaks all but the last three bits of X.n B leaks all of X one-eighth of the time, and almost
nothing seven-eighths of the time.n Both have min-entropy leakage of 61.0 bits out of 64.
n We can distinguish them with gain functions.n g3, which allows 3 tries, makes A worse than B.n gtiger, which gives a penalty for a wrong guess
(allowing “⊥” to mean “don’t guess”) makes B worse.23
Plan of the talk
n Motivationn Information-theoretic channels and leakage
n Shannon leakagen Vulnerability and min-entropy leakage
n Operational significancen Gain functions and g-leakage
n Robustnessn Capacityn Robust channel ordering
n Conclusion
24
Robustness worries
n Using g-leakage, we can express precisely a rich variety of operational scenarios.
n But we could worry about the robustness of our conclusions about leakage.
n The g-leakage Lg(π,C) depends on both π and g.n π models adversary A’s prior knowledge about X
n g models (among other things) what is valuable to A.
n How confident can we be about these?n Can we minimize sensitivity to questionable
assumptions about π and g?
25
Capacity results
n Capacity (the maximum leakage over all priors) eliminates assumptions about the prior π.
n Capacity relationships between different leakage measures are particularly useful.
n Theorem: Min-capacity is an upper bound on Shannon capacity: ML(C) ≥ SC(C).
n Theorem (“Miracle”): Min-capacity is an upper bound on g-capacity, for every g: ML(C) ≥ MLg(C).n Hence if C has small min-capacity, then it has small
g-leakage under every prior and every gain function.n (But g does affect the prior g-vulnerability.)
26
Robust channel ordering
n Given channels A and B on secret input X, the question of which leaks more will ordinarily depend on the prior and the particular leakage measure used.
n Is there a robust ordering?n This could allow a stepwise refinement methodology.n This is arguably indispensable for security.n Anything that we think is “unlikely in practice” is
arguably more likely, since adversaries are thinking about what we are thinking, and trying to exploit it!
n For deterministic channels, a robust ordering has long been understood: the Lattice of Information [LandauerRedmond93].
27
The Lattice of Informationn A deterministic channel from X to Y induces a partition on X:
secrets are in the same block iff they map to the same output.n Example: Ccountry maps a person x to the country of birth.
n Partition refinement ⊑: Subdivide zero or more of the blocks.n Example: Cstate also includes the state of birth for Americans.
n Ccountry ⊑ Cstate28
USAFrance Spain
BrazilUK China
France Spain
BrazilUK China
ORCA FL
OHNY DC
Ccountry’s partition:
Cstate’s partition:
Partition refinement and leakagen If A ⊑ B, the adversary never prefers A to B.n Interestingly, the converse also holds.n Theorem [YasuokaTerauchi10, Malacaria11]
A ⊑ B
iffA never leaks more than B, on any prior, under any of the usual leakage measures (Shannon-, min-, or guessing entropy).
n Hence ⊑ is an ordering on deterministic channels with both a structural and a leakage-testing characterization.
n Can we generalize it to probabilistic channels?29
Composition refinement
n Note that Ccountry is the cascade of Cstate and Cmerge, where Cmerge post-processes by mapping all states to USA.
n Def: A ⊑o B (“A is composition refined by B”) if there exists a (post-processing) C such that A = BC.
n On deterministic channels, composition refinement ⊑o
coincides with partition refinement ⊑.n So ⊑o generalizes ⊑ to probabilistic channels.
30
Strong leakage ordering
n Def: A ≤min B if the min-entropy leakage of A never exceeds that of B, for any prior π.
n It turns out that A ≤min B, even though A ⋢o B.
n Def: A ≤G B (“A never out-leaks B”) if the g-leakage of A never exceeds that of B, for any prior π and any gain function g.
31
z1 z2
x1
x2
x3
2/3 1/32/3 1/3
1/4 3/4
y1 y2 y3
x1
x2
x3
1/2 1/2 01/2 0 1/2
0 1/2 1/2
A = B =
Relationship between ⊑o and ≤Gn Theorem: [Generalized data-processing inequality]
If A ⊑o B then A ≤G B.n Shown in [ACPS12].n Intuitively, the adversary should never prefer BC to B.
n Theorem: [“Coriaceous Conjecture”]If A ≤G B then A ⊑o B.n Conjectured in [ACPS12], proved in [MMSEM14] using ideas
in [McIverMeinickeMorgan10].n The Separating Hyperplane Lemma is crucial for
constructing the g needed to prove the contrapositive. n So we have an ordering of probabilistic channels, with
both structural and leakage-testing significance.32
Mathematical structure of channels under ⊑o
n ⊑o is only a pre-order on channel matrices.n But channel matrices contain redundant structure
with respect to their abstract denotation as mappings from priors to hyper-distributions.
C and D are actually the same abstract channel!n Theorem: On abstract channels, ⊑o is a partial order.
n But it is not a lattice.33
C y1 y2 y3
x1
x2
x3
1 0 0
1/4 1/2 1/4
1/2 1/3 1/6
D z1 z2 z3
x1
x2
x3
2/5 0 3/5
1/10 3/4 3/20
1/5 1/2 3/10
Limits of the information-theoretic perspective
n In all the leakage measures we have discussed, the particular names of outputs are abstracted away.
n We thus model information-theoretic rather than computationally-bounded adversaries.
n Consider channels taking as input an n-bit prime p:n A outputs p2.n B randomly chooses an (n+1)-bit prime q and outputs pq.
n A and B both leak p completely.n They are the same as abstract channels.
n But, given standard assumptions about factorization, a computational measure of leakage would judge A to leak much more than B.
34
Plan of the talk
n Motivationn Information-theoretic channels and leakage
n Shannon leakagen Vulnerability and min-entropy leakage
n Operational significancen Gain functions and g-leakage
n Robustnessn Capacityn Robust channel ordering
n Conclusion
35
36
Conclusion
n Min-entropy and g-leakage allows the quantification of leakage with strong operational significance for confidentiality.
n Capacity and composition refinement support robust conclusions about leakage.
n Research directions:n Static analysis of leakage in programsn Leakage of integrity, rather than confidentialityn Computational measures of leakagen Leakage policies, and stepwise refinement of
implementations from specifications
Many thanks to all my collaborators
37
CatusciaPalamidessi
KostasChatzikokolakis
Mário AlvimMiguel Andrés
Carroll Morgan
Boris Köpf
Ziyuan Meng Barbara Espinoza
Annabelle McIver
Questions?
38
Some referencesn Landauer and Redmond, “A lattice of information”, CSFW 1993.n Clark, Hunt, and Malacaria, “Quantitative analysis of the leakage of confidential data”, ENTCS 59, 2002.n Köpf and Basin, “An information-theoretic model for adaptive side-channel attacks”, CCS 2007.n Chatzikokolakis, Palamidessi, and Panangaden, “On the Bayes risk in information-hiding protocols”, JCS
2008.n Smith, “On the foundations of quantitative information flow”, FOSSACS 2009.n Köpf and Smith, “Vulnerability bounds and leakage resilience of blinded cryptography under timing
attacks”, CSF 2010.n McIver, Meinicke, and Morgan, “Compositional closure for Bayes risk in probabilistic noninterference”,
ICALP 2010.n Yasuoka and Terauchi, “Quantitative information flow — verification hardness and possibilities”, CSF 2010.n Malacaria, “Algebraic foundations for information theoretical, probabilistic and guessability measures of
information flow,” CoRR, vol. abs/1101.3453, 2011.n Alvim, Chatzikokolakis, Palamidessi, and Smith, “Measuring information leakage using generalized gain
functions”, CSF 2012.n McIver, Meinicke, Morgan, “A Kantorovich-monadic powerdomain for information hiding, with probability
and nondeterminism”, LICS 2012.n Köpf, Mauborgne, and Ochoa, “Automatic quantification of cache side-channels”, CAV 2012.n Espinoza and Smith, “Min-entropy as a resource”, Information and Computation 227, 2013.n McIver, Morgan, Smith, Espinoza, and Meinicke, “Abstract channels and their robust information-leakage
ordering”, POST 2014.39