operational significance and robustness in quantitative

ETAPS Unifying Invited Talk, 9 April 20141

Operational Significanceand Robustness inQuantitative Information Flow

Geoffrey SmithFlorida International University

Leakage of confidential information (13 March)

2

Motivating example: Password checker

n Check whether guess is equal to password:

n Secret input: passwordn Observable output:

n resultn running time?

n (If so, the length of the correct prefix is also leaked!)3

result = true;for (i=0; i < N; i++) { if (password[i] != guess[i]) { result = false; break; }}

Server

n Crowd members wish to communicate anonymously with a server.

n The initiator first sends the message to a randomly-chosen forwarder (possibly itself).

n Each forwarder forwards it again with probability pf, or sends it to the server with probability 1-pf.

n But some crowd members are collaborators that report who sends them a message.

n Secret input: identity of initiatorn Observable output: first sender of a

message to a collaborator (or no one)

Motivating example: Crowds protocol[ReiterRubin98]

4

i

5

Motivating example: Statistical database query

n Secret input: database of confidential entriesn Observable output: result of statistical query

(possibly with noise added)

Motivating example: Timing attack on cryptography [BonehBrumley03]

n 1024-bit RSA private key recovered in 2 hours from standard OpenSSL implementation across LAN.

n Secret input: RSA private keyn Observable output: approximate timings of

decryptions of a sequence of nonces

6

SSL Handshake (simplified)

Alert/OK

Enc(pk, nonce)

7

Quantitative information flow

n Noninterference [Cohen77, GoguenMeseguer82] requires the observable output to be independent of the secret input.

n But (as seen in the examples) this is often infeasible.

n Quantitative information flow aims to provide theory justifying the intuition that some leaks are “small”.

n An active area of research for more than a decade [ClarkHuntMalacaria02].

Plan of the talk

n Motivationn Information-theoretic channels and leakage

n Shannon leakagen Vulnerability and min-entropy leakage

n Operational significancen Gain functions and g-leakage

n Robustnessn Capacityn Robust channel ordering

n Conclusion

8

Information-theoretic channelsn The earlier examples can all be modeled as channels.n Finite sets X (secret inputs), Y (observable outputs).

n The choice of Y is subtle, and crucial!

n On input X, the channel probabilistically outputs Y.n Channel matrix C gives the conditional probabilities p(y|x):

n The rows of C sum to 1.n C is deterministic if each row contains exactly one 1.

9

y1 y2 y3 y4

x1

x2

x3

1 0 0 0

0 1/4 1/2 1/4

1/2 1/3 1/6 0

Prior and posterior distributionsn We assume that X has some prior distribution π.n We assume that adversary A knows both π and C.

n Seeing output y, A can inspect column y of C to update prior π to posterior distribution pX|y.

n For example, seeing output y4 reveals that X was x2.n Leakage depends on the way C maps prior distribution π

to a set of posterior distributions.10

y1 y2 y3 y4

x1

x2

x3

1 0 0 0

0 1/4 1/2 1/4

1/2 1/3 1/6 0

Example

π

1/4

1/2

1/4

Prior

y1 y2 y3 y4

x1

x2

x3

1 0 0 0

0 1/4 1/2 1/4

1/2 1/3 1/6 0

Channel matrix

y1 y2 y3 y4

x1

x2

x3

1/4 0 0 00 1/8 1/4 1/8

1/8 1/12 1/24 0

Joint matrix

pX|y1 pX|y2 pX|y3 pX|y4

x1

x2

x3

2/3 0 0 0

0 3/5 6/7 1

1/3 2/5 1/7 0

Posterior distributions

pY 3/8 5/24 7/24 1/8

Distribution on Y

Hyper-distribution on X

3/8 5/24 7/24 1/8

x1

x2

x3

2/3 0 0 0

0 3/5 6/7 1

1/3 2/5 1/7 0 11

Abstractly, a channel is a mappingfrom priors to hyper-distributions.[McIverMeinickeMorgan10]

Quantifying leakage using Shannon entropy

n A classic measure of the “uncertainty” of a probability distribution is Shannon entropy [1948].

n H[π] = -Σx π[x] log π[x] “prior entropy”n H[π,C] = ∑y p(y) H[pX|y] “posterior entropy”n Notice that H[π,C] is just the (weighted) average

entropy in the hyper-distribution.n Shannon leakage = H[π] - H[π,C]

n This is mutual information, usually denoted by I(X;Y).

n But a key question for any leakage measure is its operational significance.

12

n Huffman code for X, when π = (1/2, 1/4, 1/8, 1/8):

n Average length = (1/2)1 + (1/4)2 + (1/4)3 = 7/4 = H[π]n Shannon’s source coding theorem: H[π] is the average

number of bits required to represent X.n But for confidentiality, we might worry more about X’s

vulnerability to be guessed correctly by adversary A.n If π = (1/2, 2-1000, 2-1000, ..., 2-1000), then H[π] = 500.5 bits,

but half the time A can guess X correctly in just one try!

Operational significance of Shannon entropy

13

0

0

0

1

1

1

x1

x2

x3 x4

x1 : 0x2 : 10x3 : 110x4 : 111

n [Smith09] proposed to measure leakage based on X’s vulnerability to be guessed by A in one try.

n V[π] = maxx π[x] “prior vulnerability”n V[π,C] = ∑y p(y) V[pX|y] “posterior vulnerability”

n V[π,C] is the average vulnerability in the hyper-distribution.n V[π,C] is the complement of the Bayes risk.

n Min-entropy leakage:

L(π,C) = log

n Note: -log V[π] is Rényi’s min-entropy H∞[π].

Vulnerability and min-entropy leakage

14

V[π,C] V[π]

Leaking i bits means increasing the vulnerability by a factor of 2i.

15

Channel capacity

n For any leakage measure, it is useful to consider the maximum leakage over all priors π.

n Definition: Shannon capacity SC(C) = supπ I(X;Y)

n Definition: Min-capacity ML(C) = supπ L(π,C)

n Theorem: ML(C) is the log of the sum of the column maximums of C. It is realized on a uniform prior π.n Corollary: ML(C) = 0 iff the rows of C are identical.

(i.e. C satisfies noninterference)

n Theorem: For deterministic C, ML(C) = SC(C).

16

Channels in Cascade AB

n Formed by multiplying two channel matrices, or by factoring a channel matrix.

n Theorem [EspinozaSmith11]: L(π,AB) ≤ L(π,A)n Analogue of the classic data-processing inequality for

mutual information.

n Theorem: ML(AB) ≤ min { ML(A), ML(B) }

C2X Y ZA B

Repeated independent runs C(n)

n Theorem [KöpfSmith10]: ML(C(n)) ≤ |Y| log(n+1).n Application to timing attacks on blinded cryptography:

n Blinding randomizes the ciphertext before decryption, and de-randomizes after decryption.

n As a result, the n-observation timing attack is a repeated independent runs channel C(n).

n Hence the min-capacity grows only logarithmically in n.17

XYC

YC

C Y⋮

(Useful only whenC is probabilistic!)

Plan of the talk





n Conclusion

18

Limitations of min-entropy leakage

n Min-entropy leakage implicitly assumes an operational scenario where adversary A benefits only by guessing X exactly, and in one try.

n But many other scenarios are possible:n Maybe A can benefit by guessing X partially or

approximately.n Maybe A is allowed to make multiple guesses.

n Maybe A is penalized for making a wrong guess.

n How can any single leakage measure be appropriate in all scenarios?

19

Gain functions and g-leakage [AlvimChatzikokolakisPalamidessiSmith12]

n We generalize min-entropy leakage by introducing gain functions to model the operational scenario.

n In any scenario, there is a finite set W of guesses that A can make about the secret.

n For each guess w and secret value x, there is a gain g(w,x) that A gets by choosing w when the secret’s actual value is x.

n Definition: gain function g : W × X → [0, 1]

n Example: Min-entropy leakage implicitly usesgid(w,x) =

20

1, if w = x0, otherwise{

g-vulnerability and g-leakage

n Definition: Prior g-vulnerability:Vg[π] = maxw ∑x π[x]g(w,x)“A’s maximum expected gain, over all possible guesses.”

n Posterior g-vulnerability: Vg[π,C] = ∑y p(y) Vg[pX|y]

n g-leakage: Lg(π,C) = log

n g-capacity: MLg(C) = supπ Lg(π,C)21

Vg[π,C] Vg[π]

♀

♂

Ann

Sue

Paul

Bob

Tom

Lab location:

N 39 .95185

W 75 .18749PassWord

Dictionary:

superman

apple-juice

johnsmith62

secret.flag

history123

...

The power of gain functions

22

SW1

4 SE1

4

NW1

4 NE1

4

CC

d d

d d

Guessing a secret approximately.g(w,x) = 1 − dist(w,x)

Guessing a property of a secret.g(w,x) = Is x of gender w?

Guessing a part of a secret.g(w, x) = Does w match the high-order bits of x?

Guessing a secret in 3 tries.g3(w, x) = Is x an element of set w of size 3?

Distinguishing channels with gain functions

n Two channels on a uniformly distributed, 64-bit X:A. Y = X | 0x7;B. if (X % 8 == 0) Y = X; else Y = 1;n A always leaks all but the last three bits of X.n B leaks all of X one-eighth of the time, and almost

nothing seven-eighths of the time.n Both have min-entropy leakage of 61.0 bits out of 64.

n We can distinguish them with gain functions.n g3, which allows 3 tries, makes A worse than B.n gtiger, which gives a penalty for a wrong guess

(allowing “⊥” to mean “don’t guess”) makes B worse.23

Plan of the talk





n Conclusion

24

Robustness worries

n Using g-leakage, we can express precisely a rich variety of operational scenarios.

n But we could worry about the robustness of our conclusions about leakage.

n The g-leakage Lg(π,C) depends on both π and g.n π models adversary A’s prior knowledge about X

n g models (among other things) what is valuable to A.

n How confident can we be about these?n Can we minimize sensitivity to questionable

assumptions about π and g?

25

Capacity results

n Capacity (the maximum leakage over all priors) eliminates assumptions about the prior π.

n Capacity relationships between different leakage measures are particularly useful.

n Theorem: Min-capacity is an upper bound on Shannon capacity: ML(C) ≥ SC(C).

n Theorem (“Miracle”): Min-capacity is an upper bound on g-capacity, for every g: ML(C) ≥ MLg(C).n Hence if C has small min-capacity, then it has small

g-leakage under every prior and every gain function.n (But g does affect the prior g-vulnerability.)

26

Robust channel ordering

n Given channels A and B on secret input X, the question of which leaks more will ordinarily depend on the prior and the particular leakage measure used.

n Is there a robust ordering?n This could allow a stepwise refinement methodology.n This is arguably indispensable for security.n Anything that we think is “unlikely in practice” is

arguably more likely, since adversaries are thinking about what we are thinking, and trying to exploit it!

n For deterministic channels, a robust ordering has long been understood: the Lattice of Information [LandauerRedmond93].

27

The Lattice of Informationn A deterministic channel from X to Y induces a partition on X:

secrets are in the same block iff they map to the same output.n Example: Ccountry maps a person x to the country of birth.

n Partition refinement ⊑: Subdivide zero or more of the blocks.n Example: Cstate also includes the state of birth for Americans.

n Ccountry ⊑ Cstate28

USAFrance Spain

BrazilUK China

France Spain

BrazilUK China

ORCA FL

OHNY DC

Ccountry’s partition:

Cstate’s partition:

Partition refinement and leakagen If A ⊑ B, the adversary never prefers A to B.n Interestingly, the converse also holds.n Theorem [YasuokaTerauchi10, Malacaria11]

A ⊑ B

iffA never leaks more than B, on any prior, under any of the usual leakage measures (Shannon-, min-, or guessing entropy).

n Hence ⊑ is an ordering on deterministic channels with both a structural and a leakage-testing characterization.

n Can we generalize it to probabilistic channels?29

Composition refinement

n Note that Ccountry is the cascade of Cstate and Cmerge, where Cmerge post-processes by mapping all states to USA.

n Def: A ⊑o B (“A is composition refined by B”) if there exists a (post-processing) C such that A = BC.

n On deterministic channels, composition refinement ⊑o

coincides with partition refinement ⊑.n So ⊑o generalizes ⊑ to probabilistic channels.

30

Strong leakage ordering

n Def: A ≤min B if the min-entropy leakage of A never exceeds that of B, for any prior π.

n It turns out that A ≤min B, even though A ⋢o B.

n Def: A ≤G B (“A never out-leaks B”) if the g-leakage of A never exceeds that of B, for any prior π and any gain function g.

31

z1 z2

x1

x2

x3

2/3 1/32/3 1/3

1/4 3/4

y1 y2 y3

x1

x2

x3

1/2 1/2 01/2 0 1/2

0 1/2 1/2

A = B =

Relationship between ⊑o and ≤Gn Theorem: [Generalized data-processing inequality]

If A ⊑o B then A ≤G B.n Shown in [ACPS12].n Intuitively, the adversary should never prefer BC to B.

n Theorem: [“Coriaceous Conjecture”]If A ≤G B then A ⊑o B.n Conjectured in [ACPS12], proved in [MMSEM14] using ideas

in [McIverMeinickeMorgan10].n The Separating Hyperplane Lemma is crucial for

constructing the g needed to prove the contrapositive. n So we have an ordering of probabilistic channels, with

both structural and leakage-testing significance.32

Mathematical structure of channels under ⊑o

n ⊑o is only a pre-order on channel matrices.n But channel matrices contain redundant structure

with respect to their abstract denotation as mappings from priors to hyper-distributions.

C and D are actually the same abstract channel!n Theorem: On abstract channels, ⊑o is a partial order.

n But it is not a lattice.33

C y1 y2 y3

x1

x2

x3

1 0 0

1/4 1/2 1/4

1/2 1/3 1/6

D z1 z2 z3

x1

x2

x3

2/5 0 3/5

1/10 3/4 3/20

1/5 1/2 3/10

Limits of the information-theoretic perspective

n In all the leakage measures we have discussed, the particular names of outputs are abstracted away.

n We thus model information-theoretic rather than computationally-bounded adversaries.

n Consider channels taking as input an n-bit prime p:n A outputs p2.n B randomly chooses an (n+1)-bit prime q and outputs pq.

n A and B both leak p completely.n They are the same as abstract channels.

n But, given standard assumptions about factorization, a computational measure of leakage would judge A to leak much more than B.

34

Plan of the talk





n Conclusion

35

36

Conclusion

n Min-entropy and g-leakage allows the quantification of leakage with strong operational significance for confidentiality.

n Capacity and composition refinement support robust conclusions about leakage.

n Research directions:n Static analysis of leakage in programsn Leakage of integrity, rather than confidentialityn Computational measures of leakagen Leakage policies, and stepwise refinement of

implementations from specifications

Many thanks to all my collaborators

37

CatusciaPalamidessi

KostasChatzikokolakis

Mário AlvimMiguel Andrés

Carroll Morgan

Boris Köpf

Ziyuan Meng Barbara Espinoza

Annabelle McIver

Questions?

38

Some referencesn Landauer and Redmond, “A lattice of information”, CSFW 1993.n Clark, Hunt, and Malacaria, “Quantitative analysis of the leakage of confidential data”, ENTCS 59, 2002.n Köpf and Basin, “An information-theoretic model for adaptive side-channel attacks”, CCS 2007.n Chatzikokolakis, Palamidessi, and Panangaden, “On the Bayes risk in information-hiding protocols”, JCS

2008.n Smith, “On the foundations of quantitative information flow”, FOSSACS 2009.n Köpf and Smith, “Vulnerability bounds and leakage resilience of blinded cryptography under timing

attacks”, CSF 2010.n McIver, Meinicke, and Morgan, “Compositional closure for Bayes risk in probabilistic noninterference”,

ICALP 2010.n Yasuoka and Terauchi, “Quantitative information flow — verification hardness and possibilities”, CSF 2010.n Malacaria, “Algebraic foundations for information theoretical, probabilistic and guessability measures of

information flow,” CoRR, vol. abs/1101.3453, 2011.n Alvim, Chatzikokolakis, Palamidessi, and Smith, “Measuring information leakage using generalized gain

functions”, CSF 2012.n McIver, Meinicke, Morgan, “A Kantorovich-monadic powerdomain for information hiding, with probability

and nondeterminism”, LICS 2012.n Köpf, Mauborgne, and Ochoa, “Automatic quantification of cache side-channels”, CAV 2012.n Espinoza and Smith, “Min-entropy as a resource”, Information and Computation 227, 2013.n McIver, Morgan, Smith, Espinoza, and Meinicke, “Abstract channels and their robust information-leakage

ordering”, POST 2014.39

operational significance and robustness in quantitative

Documents