coms30124 : crypto and information...

COMS30124 : Crypto and Information Theory

Elisabeth Oswald and Nigel Smart

Department of Computer Science,University Of Bristol,

Merchant Venturers Building,Woodland Road,Bristol, BS8 1UBUnited Kingdom.

11th October 2006


COMS30124 : Crypto and Information Theory Slide 1

Outline

Computational Security

Recap on Probability Theory

Probability and Ciphers

Shannon’s Theorem

Entropy and Uncertainty

Entropy and Cryptography



Information Theory and Cryptography

Information Theory is one of the foundations of computer science.

Here we will examine its relationship to cryptography.

We will be followingI Chapter 4 of Smart - Cryptography, an Introduction

Other books are...I Chapter 2 of Stinson - Cryptography : Theory and PracticeI Chapter 1 of Welsh - Codes and Cryptography




A system is computationally secure if the best algorithm for breakingit requires N operations.

I Where N is a very big numberI No practical system can be proved secure under this definition.

In practice we say a system is computationally secure if the bestknown algorithm for breaking it requires an unreasonably largeamount of computer time.




Another, practical, approach is to reduce a well studied hardproblem to the problem of breaking the system.

I E.g. : The system is secure if a given integer n cannot befactored.

Systems of this form are often called provably secure.I However, we only have a proof relative to some hard problem.I Not an absolute proof.

Essentially bounding the computational power of the adversary.I Even if the adversary has limited (but large) resources she still

will not break the system.




When considering schemes which are computationally secure

I We need to be careful about the key sizes etc.I We need to keep abreast of current algorithmic developmentsI At some point in the future we should expect our system to be

broken (may be many millennia hence though).

Most schemes in use today are computationally secure.



Unconditional Security

For unconditional security we place no bound on the computationalpower of the adversary.

In other words, a system is unconditionally secure if it cannot bebroken even with infinite computing power.

I Some systems are unconditionally secure.

Other names for unconditionally secure areI Perfectly secureI Information Theoretically Secure



Examples

Of the ciphers we have seen, or of those we are to see later on, thefollowing are not computationally secure

I Caesar cipherI Substitution cipherI Vigenère cipher

The following are computationally secure but not unconditionallysecure.

I DES (?) - AESI RSA

One time pad is unconditionally secure if used correctly.



Probability Diversion

To study perfect security we need to look a little at probability.

A random variable X is a variable which takes certain values withcertain probabilities.

Examples:I Let X be the random variable representing tosses of a fair coin

I p(X = heads) = 1/2I p(X = tails) = 1/2

I Let X be the random variable representing letters in English text

I p(X = a) = 0.082, p(X = e) = 0.127, p(X = z) = 0.001



Probability Diversion

Let X and Y be random variables.I p(X = x) is the probability that X takes the value x .I p(Y = y) is the probability that Y takes the value y .

The joint probability is defined asI p(X = x , Y = y) is the probability that X takes the value x and

Y takes the value y .

X and Y are independent iffI p(X = x , Y = y) = p(X = x) ·p(Y = y) for all values of x and y .



Conditional Probability

The conditional probability is defined asI p(X = x | Y = y) is the probability that X takes the value x

given that Y takes the value y .

We have

p(X = x , Y = y) = p(X = x | Y = y) · p(Y = y)

p(X = x , Y = y) = p(Y = y | X = x) · p(X = x)



Bayes’ Theorem

The following is one of the most crucial statements in probability.

Bayes’ TheoremIf p(Y = y) > 0 then

p(X = x | Y = y) =p(X = x , Y = y)

p(Y = y)

=p(Y = y | X = x) · p(X = x)

p(Y = y)

X and Y are independent iff p(X = x | Y = y) = p(X = x)

I i.e. value of X does not depend on the value of Y .




Let P denote the set of possible plaintexts.Let K denote the set of possible keys.Let C denote the set of possible ciphertexts.Let P, K , C be associated random variables with probabilities

p(P = m) p(K = k) p(C = c).

We make the reasonable assumption that P and K are independent.

The set of ciphertexts under the key k is defined by

C(k) = {ek (m) : m ∈ P}.




We have the relationship

p(C = c) =∑

{k :c∈C(k)}

p(K = k) · p(P = dk (c)).

For c ∈ C and m ∈ P we can compute the probabilityp(C = c | P = m).

This is the probability that c is the ciphertext given that m is theplaintext

p(C = c | P = m) =∑

{k :m=dk (c)}

p(K = k).

To break a cipher we want to know the probabilities of the plaintextgiven a certain ciphertext.




We can compute the probability of m being the plaintext given c isthe ciphertext,

p(P = m | C = c) =p(C = c | P = m) · p(P = m)

p(C = c)

=p(P = m) ·

∑{k :m=dk (c)} p(K = k)∑

{k :c∈C(k)} p(K = k) · p(P = dk (c)).

This can be computed by anyone who knows the probabilitydistributions of K , P and the encryption function.

Using these probabilities one may be able to deduce someinformation about the plaintext once you have seen the ciphertext.



ExampleSuppose we have P = {a, b}, i.e. only two possible messages

I p(P = a) = 1/4 and p(P = b) = 3/4Suppose we have K = {k1, k2, k3}, i.e. three possible keys

I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.Suppose we have C = {1, 2, 3, 4} with encryption given by

ek (m) a bk1 1 2k2 2 3k3 3 4

We can then compute

p(C = 1) = 1/8,

p(C = 2) = 7/16,

p(C = 3) = 1/4,

p(C = 4) = 3/16.



Example

We can now compute the conditional probabilities

p(P = a | C = 1) = 1 p(P = b | C = 1) = 0p(P = a | C = 2) = 1/7 p(P = b | C = 2) = 6/7p(P = a | C = 3) = 1/4 p(P = b | C = 3) = 3/4p(P = a | C = 4) = 0 p(P = b | C = 4) = 1

HenceI If we see the ciphertext 1 we know the message is a.I If we see the ciphertext 4 we know the message is b.I If we see the ciphertext 3 we guess the message is b.I If we see the ciphertext 2 we guess the message is b.



Example - Conclusion

So in the previous example the ciphertext does reveal informationabout the plaintext.

This is exactly what we wish to avoid.

We want the ciphertext to give no information about the plaintext.

A system with this property is said to be perfectly secure.



Perfect Secrecy

A cryptosystem has perfect secrecy iff

p(P = m | C = c) = p(P = m).

for all m ∈ P and c ∈ C.

That is, the probability that the plaintext is m given that theciphertext c is observed is the same as the probability that theplaintext is m without seeing c.

In other words knowing c reveals no information about m.



Perfect Secrecy

Recall: Perfect secrecy means p(P = m | C = c) = p(P = m). Thisis equivalent to

p(C = c | P = m) = p(C = c).

Assume, p(C = c) > 0 for all c ∈ C (if not remove c from C).

For any fixed m we have p(C = c | P = m) = p(C = c) > 0. Thismeans that for all c there must be at least one key k such that

ek (m) = c.

Conclusion:#K ≥ #C.

Note: We always have #C ≥ #P since encryption with a given key kis injective.



Shannon’s Theorem

Shannon’s Theorem is the most important theorem in theinformation theoretic study of cryptography.

Shannon’s TheoremSuppose (P, C, K, ek (·), dk (·)) is a cryptosystem with#P = #C = #K.This cryptosystem provides perfect secrecy if and only if every key isused with equal probability 1/#K and, for each m ∈ P and c ∈ C,there is a unique key k such that ek (m) = c.

Note the statement is if and only if hence we need to prove it in bothdirections.



ProofSuppose system provides perfect secrecy.

Then, for any fixed m ∈ P, we know that for all c ∈ C there is at leastone key k such that ek (m) = c.

Since #C = #K we have

#C = #{ek (m) : k ∈ K} = #K

i.e. there do not exist two keys k1 and k2 such that

ek1(m) = ek2(m) = c.

So for all m ∈ P and c ∈ C there is unique k ∈ K such thatek (m) = c.

We need to show that every key is used with equal probability, i.e.

p(K = k) = 1/#K for all k ∈ K.



Proof

Let n = #K and P = {mi : 1 ≤ i ≤ n} and fix c ∈ C.

Label the keys k1, . . . , kn such that

eki (mi) = c for 1 ≤ i ≤ n.

Due to perfect secrecy we have p(P = mi | C = c) = p(P = mi) andthus

p(P = mi) = p(P = mi | C = c)

=p(C = c | P = mi) · p(P = mi)

p(C = c)

=p(K = ki) · p(P = mi)

p(C = c)



Proof

Hence we obtain that for all 1 ≤ i ≤ n,

p(C = c) = p(K = ki).

Since∑n

i=1 p(K = ki) = 1 we have

n · p(C = c) = 1 ⇒ p(C = c) = 1/n,

thus all keys are used with equal probability.

Conclusion:p(K = k) = 1/#K for all k ∈ K.



Proof

Now we need to prove the result in the other direction.

Suppose that

I #K = #C = #P;I every key is used with equal probability 1/#K; andI for each m ∈ P and c ∈ C there is a unique key k with

ek (m) = c.

Then we need to show that the system is perfectly secure, i.e.

p(P = m | C = c) = p(P = m).



ProofSince each key is used with equal probability, we have, for fixed c

p(C = c) =∑

{k :c∈C(k)}

p(K = k) · p(P = dk (c))

=1

#K∑

{k :c∈C(k)}

p(P = dk (c)).

Since for each m and c there is a unique key k with ek (m) = c wehave ∑

{k :c∈C(k)}

p(P = dk (c)) =∑m∈P

p(P = m) = 1.

Conclusion: p(C = c) = 1/#K.



Proof

In addition, if c = ek (m) then

p(C = c | P = m) = p(K = k) = 1/#K.

Then using Bayes’ Theorem we have

p(P = m | C = c) =p(C = c | P = m) · p(P = m)

p(C = c)

=1/#K · p(P = m)

1/#K= p(P = m).

Q.E.D.



Example - Shift Cipher

For the Shift Cipher we had P = K = C = Z26 and ek (m) = m + kmod 26.

Shannon’s Theorem implies perfect secrecy if we encrypt 1 letter.

Extension to plaintext of length n by using Shift Cipher with new keyfor each letter.For this system we clearly have

P = K = C = (Z26)n p(K = k) =

126n .

Furthermore, for each m and c there is a unique k such thatek (m) = c.Shannon’s Theorem then implies perfect secrecy.



Example - Vernam One-Time Pad

Vernam Cipher uses Shift cipher but modulo 2 instead of modulo 26.

Binary arithmetic or XOR is defined as

⊕ 0 10 0 11 1 0

Vernam One-Time PadI Gilbert Vernam patented this cipher in 1917 for encryption and

decryption of telegraph messages.I To send a binary string you need a key as long as the message.I Each key can be used only once hence One-Time Pad.



Example - Vernam One-Time Pad

Clearly we cannot use the same key twice owing to following chosenplaintext attack.

I Eve generates m and asks Alice to encrypt it.I Eve receives c = m ⊕ k from Alice.I Eve can now compute the key k = c ⊕m.I Eve can decrypt all messages encrypted with k .

One time pad is used in some military and diplomatic contexts.



Key DistributionPerfect secrecy ⇒ length of key is at least length of plaintext.

I Key distribution becomes the major problem.I Solution in fourth year course COMS40213: Information

Security.

Aim of modern cryptography is to design systems whereI one key can be used many times; andI a short key can encrypt a long message.

Such systems will not be unconditionally secure, but should be atleast computationally secure.

We need to use some Information Theory to analyse the situationwhere the same key is used for multiple encryptions.Again, the main results are due to Shannon in the late 1940’s.



Uncertainty

Consider the following examples.

I The outcome of a throw of two dice is more uncertain than theoutcome of the throw of one die.

I The outcome of the throw of a fair die is more uncertain thanthe outcome of the throw of a biased die.

I The uncertainty of the random variable X with p(X = 0) = pand p(X = 1) = 1− p is the same as the uncertainty of therandom variable Y with p(Y = a) = p and p(Y = d) = 1− p.

If we want to define uncertainty H(X ) of some random variable Xthen H(X ) should be a function of the probability distribution of Xonly.



Uncertainty - Shannon’s Axioms

In 1948 Shannon proposed 8 requirements which a sensibledefinition of uncertainty H(X ) should satisfy.Let X be random variable with values x1, . . . , xn and probabilitiespi = p(X = xi) for i = 1, . . . , n then the most important requirementsare:

I H(p1, . . . , pn) is maximum when pi = 1/n for all i ;I H(p1, . . . , pn) ≥ 0 and equals zero only when pi = 1 for some i ;I H(1

n , . . . , 1n ) ≤ H( 1

n+1 , . . . , 1n+1) for n ∈ N (A two-horse race is

less uncertain than a three-horse race.); andI H( 1

mn , . . . , 1mn ) = H( 1

m , . . . , 1m ) + H(1

n , . . . , 1n ) for m, n ∈ N.

(Linearity condition: Uncertainty when throwing an m-sided diefollowed by an n sided die is sum of individual uncertainties.)



Entropy = Uncertainty

From the requirements on the previous slide one can prove that theonly possible definition for H(X ) is the following.

Given a random variable X that takes on a finite set of values withprobabilities p1, . . . , pn, the uncertainty or entropy is

H(X ) = −n∑

i=1

pi log2 pi .

Note that if pi = 0 we remove it from the above sum.



Entropy - Examples

Let X be the throw of a fair die, i.e. p(X = i) = 1/6 for i = 1, . . . , 6,then

H(X ) = −6∑

i=1

16

log216

= − log216

= log2 6.

More general, if X takes on n values with equal probability then

H(X ) = −n∑

i=1

1n

log21n

= log2 n

Suppose X is the answer to a question with values either Yes or No.I If I always answer Yes, then there is no uncertainty, i.e.

H(X ) = 0.I If Yes and No are equally probable then H(X ) = 1.



InformationConsider the following examples.

I Suppose I toss a fair coin and tell you the outcome of theexperiment, then I have given you 1 bit of information.

I Suppose I toss a fair coin n times, then the information of theoutcome of the experiment clearly is n bits.

I Suppose I answer Yes to a question with probability 0.99, andNo with probability 0.01, then the answer Yes providesconsiderably less information than the answer No, since onealready expected the answer Yes.

These examples suggest that the information I of an event E whichoccurs with probability p should be defined as

I(E) = − log2 p.



Entropy and InformationLet X be a random variable that takes on the values x1, . . . , xn withpi = p(X = xi), then the information content of the event X = xi is

I(X = xi) = − log2 pi .

Recall that the entropy of a random variable X was defined as

H(X ) = −n∑

i=1

pi log2 pi .

This is the mean value of the information content of the eventsX = xi .

Therefore, entropy measures the average information content of anobservation of X .

Conclusion: Loss of entropy is gain of information !Elisabeth Oswald and Nigel Smart


Example

Let us return to our example cryptosystem from earlier.

The possible plaintexts, keys and ciphertexts wereI P = {a, b},I K = {k1, k2, k3},I C = {1, 2, 3, 4}.

We had the following probabilitiesI p(P = a) = 1/4 and p(P = b) = 3/4.I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.I p(C = 1) = 1/8, p(C = 2) = 7/16, p(C = 3) = 1/4 and

p(C = 4) = 3/16.



Example

Then we have

H(P) =−14

log214− 3

4log2

34≈ 0.81,

H(K ) =−12

log212− 2

14

log214≈ 1.5,

H(C) =−18

log218− 7

16log2

716

− 14

log214− 3

16log2

316

≈ 1.85.

Note that the uncertainty or entropy H(C) of the ciphertext is smallerthan the sum of the entropies of the plaintext H(P) and the keyH(K ).

Later we will see that the difference is the remaining uncertaintyabout the key given the the ciphertext.



A Fact About Logarithms

The following is a special case of Jensen’s inequality which we willneed to discuss entropy in more depth.

Suppose ai > 0 for i = 1, . . . , n andn∑

i=1

ai = 1.

Then, if xi > 0 for i = 1, . . . , n we have

n∑i=1

ai log2(xi) ≤ log2

(n∑

i=1

aixi

).

Furthermore, equality occurs if and only if x1 = x2 = . . . = xn.



Upper Bound on Entropy

Suppose X is a random variable that takes on values x1, . . . , xn withprobability distribution pi = p(X = xi) for i = 1, . . . , n then

H(X ) = −n∑

i=1

pi log2 pi =n∑

i=1

pi log21pi

≤ log2

n∑i=1

(pi ×

1pi

)(by Jensen’s Inequality)

= log2 n.

Conclusion:For random variable X with n possible values we haveH(X ) ≤ log2 n and we obtain equality if and only if pi = 1/n for all i .



Joint EntropyLet X and Y be random variables with values x1, . . . , xn andy1, . . . , ym and joint probability

rij = p(X = xi , Y = yj)

for i = 1, . . . , n and j = 1, . . . , m.

The joint entropy is defined as

H(X , Y ) = −n∑

i=1

m∑j=1

rij log2 rij .

The joint entropy H(X , Y ) is the uncertainty of the random variablesX and Y together.

The joint entropy H(X , Y ) measures the average informationcontent of an observation of X and Y together.



Joint EntropyLet X and Y be random variables then we have the inequality

H(X , Y ) ≤ H(X ) + H(Y ),

with equality if and only if X and Y are independent.

Reminder:X and Y are independent means that for all i and j

p(X = xi , Y = yj) = p(X = xi) · p(Y = yj).

Proof can be found inI Stinson - Cryptography: Theory and Practice, Theorem 2.7, p.

57 andI Welsh - Codes and Cryptography, Theorem 2, p. 6.



Conditional Entropy

Conditional entropy measures the average uncertainty of a randomvariable X given an observation of a random variable Y .

Reminder: If X and Y are random variables with values x1, . . . , xn

and y1, . . . , ym then the conditional probability p(X = xi |Y = yj) isthe probability that the value of X will be xi given that the value of Yis yj .

The conditional entropy of X given Y = yj is defined as

H(X |Y = yj) = −n∑

i=1

p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).



Conditional Entropy

The conditional entropy of X given Y is defined as the weightedaverage of the entropies H(X |Y = yj) for j = 1, . . . , m, i.e.

H(X |Y ) =∑m

j=1 p(Y = yj) · H(X |Y = yj)

= −∑m

j=1∑n

i=1 p(Y = yj) · p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).

Conditional entropy measures the average uncertainty of a randomvariable X given observations of a random variable Y , averagedover all values that Y can take.



Conditional and Joint Entropy

Conditional and joint entropy are linked by the following formula

H(X , Y ) = H(Y ) + H(X |Y ).

Proof: Welsh - Codes and Cryptography, Theorem 1, p. 8.

As an immediate consequence, we have the following upper bound

H(X |Y ) ≤ H(X )

with equality if and only if X and Y are independent.

Proof: Welsh - Codes and Cryptography, Corollary, p. 9.



Information and Entropy

Reminder: Loss of uncertainty is gain of information.

Let X and Y be two random variables, then the information about Xconveyed by Y is defined as

I(X |Y ) = H(X )− H(X |Y ).

Clearly I(X |Y ) = 0 if and only if X and Y are independent.

Remark:I Strangely enough we have I(X |Y ) = I(Y |X ).I Proof: Welsh - Codes and Cryptography, Proof, p. 11.



Conditional Entropy and Cryptography

Let P, K, C be the set of possible messages, keys and ciphertextswith associated random variables P, K , C.

H(P|K , C) = 0I Given the ciphertext and the key, you know the plaintext since it

is the decryption of the given ciphertext under the given key.

H(C|P, K ) = 0I Given the plaintext and the key, you know the ciphertext since it

is the encryption of the given plaintext under the given key.I Note: Modern public key encryption schemes do not have this

last property when used correctly.



Key Equivocation

The conditional entropy H(K |C) is called the key equivocation andmeasures the average uncertainty remaining about the key when aciphertext has been observed.

Suppose that an adversary wants to determine the key of anon-perfect cipher.The smaller H(K |C) is, the easier it will be to recover the key.

The information revealed about the key by the ciphertext is the lossof uncertainty about the key when a ciphertext has been observed,i.e.

I(K |C) = H(K )− H(K |C).



Key EquivocationFor a cryptosystem (P, C, K, ek (·), dk (·)) we have

H(K |C) = H(K ) + H(P)− H(C).

In words: The remaining uncertainty about the key when a ciphertexthas been observed is equal to the sum of the uncertainties aboutthe key and the plaintext minus the uncertainty about the ciphertext.

Proof can be found inI Stinson Cryptography: theory and practice, Theorem 2.10,

p. 59.

As a consequence of the last two equations, the informationrevealed about the key by the ciphertext is equal to

I(K |C) = H(C)− H(P).



Example - Key Equivocation

Returning to our example cryptosystem from earlier

H(P) ≈ 0.81, H(K ) ≈ 1.5 and H(C) ≈ 1.85.

Using the formula for H(K |C) we get

H(K |C) = H(K ) + H(P)− H(C) ≈ 1.5 + 0.81− 1.85 ≈ 0.46.

So the remaining uncertainty about the key is less than half a bit.

And the information revealed about the key by the ciphertext is

I(K |C) = H(C)− H(P) ≈ 1.85− 0.81 ≈ 1.04.

Thus the ciphertext leaks more than 1 bit of information about thekey.



Spurious Keys

If you know that the plaintext is taken from a ‘natural’ language, thenknowing the ciphertext rules out a certain subset of the keys.

Of the remaining possible keys, only one is correct.The remaining possible, but incorrect, keys are called the spuriouskeys.

Consider the Shift Cipher with the same key for each letter.I Suppose the ciphertext is WNAJW.I The plaintext is known to be an English word.I The only ‘meaningful’ plaintexts are RIVER and ARENA.I We have two possible keys E and W .I One is correct and one is spurious.



Natural Language

To prove a bound on the number of spurious keys, we need to definewhat we mean by the entropy per letter HL of a natural language L.

Ideally we would like HL to be defined such that the number ofmeaningful strings of length n, which we denote T (n), with n � 0 isabout

2nHL ≈ T (n).

In a natural language there are very few meaningful strings, so theentropy per letter HL will be lower than the entropy of a randomstring,

HL ≤ log2 26 ≈ 4.7.



Natural LanguageWe get a better approximation if we use the probabilities with whichletters occur in English: if P is the random variable representing theletters in the English language, then

p(P = a) = 0.082, p(P = b) = 0.015, . . . , p(P = z) = 0.001.

This gives us the upper bound

HL ≤ H(P) ≈ 4.19.

However, successive letters are clearly not independent which willfurther reduce the entropy per letter.

An even better approximation is to use P2, i.e. the random variableof bigrams in English, which leads to the bound

HL ≤H(P2)

2≈ 3.90.



Natural LanguageContinuing this process, we are led to the following definition.

The entropy per letter HL of a natural language L is defined as

HL = limn→∞

H(Pn)

n,

where Pn is the random variable for n-grams.

This is hard to compute exactly but we can approximate it andvarious experiments yield the empirical result

1.0 ≤ HL ≤ 1.5.

So each letter in EnglishI requires 5 = dlog2 26e bits of data to represent it, butI Huffman encoding would only use 1.5 bits per letter.



RedundancyFor a language L with entropy HL and alphabet P, we need aboutn log2 #P bits to represent a string of length n.However a compact encoding only needs about nHL bits.

The redundancy RL of a language is defined as the relativedifference between both encodings, i.e.

RL =n log2 #P− nHL

n log2 #P= 1− HL

log2 #P.

If we take HL ≈ 1.25 then the redundancy of English is

RL = 1− 1.25log2 26

= 0.75.

So we can compress an English text file of 10 MB down to 2.5 MB.



Spurious Keys

Let Pn and Cn be the set of n-grams of plaintext and ciphertext, withassociated random variables Pn and Cn.

Suppose we use the same key k ∈ K with associated randomvariable K to encrypt each letter, then

K (c) = {k ∈ K : ∃m ∈ Pn, p(Pn = m) > 0, ek (m) = c},

is the set of possible keys for which c is the encryption of ameaningful message of length n.

Therefore, given the ciphertext c the number of spurious keys is

#K (c)− 1,

since there is only 1 correct key.



Spurious Keys

The average number of spurious keys over all possible ciphertextsof length n is denoted by sn and equals

sn =∑c∈Cn

p(Cn = c) · (#K (c)− 1)

=∑c∈Cn

p(Cn = c) ·#K (c)−∑c∈Cn

p(Cn = c)

=∑c∈Cn

p(Cn = c) ·#K (c)− 1

We will now relate sn to the key equivocation H(K |Cn).



Key Equivocation and Spurious Keys

Recall that H(K |Cn) is the average of H(K |Cn = c) over all possibleciphertexts and thus

H(K |Cn) =∑c∈Cn

p(Cn = c) · H(K |Cn = c)

≤∑c∈Cn

p(Cn = c) · log2 #K (c) (most uncertain when all equally likely)

≤ log2

(∑c∈Cn

p(Cn = c) ·#K (c)

)(by Jensen’s inequality)

= log2(sn + 1). (from last slide)

Conclusion: H(K |Cn) ≤ log2(sn + 1).



Key Equivocation and Spurious KeysRecall that the key equivocation H(K |Cn) could be expressed as

H(K |Cn) = H(K ) + H(Pn)− H(Cn).

For a language L with entropy HL we can use the estimate

H(Pn) ≈ nHL = n(1− RL) log2 #P,

provided that n is reasonably large.

Since the entropy is always bounded by the log2 of number of values

H(Cn) ≤ n log2 #C.

Conclusion: If #P = #C then, putting all this together, we have theinequality

H(K |Cn) ≥ H(K )− nRL log2 #P.



Bound on Number of Spurious KeysCombining the result of the two previous slides, we get the bound

log2(sn + 1) ≥ H(K )− nRL log2 #P.

Theorem:Suppose that (P, C, K, ek (·), dk (·)) is a cryptosystem with #P = #Csuch that keys are chosen equiprobably. If RL is the redundancy ofthe underlying language, then given a ciphertext of length n, theexpected number of spurious keys sn satisfies

sn ≥#K

(#P)nRL− 1.

Example:For a substitution cipher we have #P = 26, #K = 26! ≈ 288.4 andtake RL = 0.75, then

sn ≥ 288.4−3.5n − 1.



Unicity DistanceThe unicity distance n0 of a cryptosystem is the value of n at whichthe expected number of spurious keys becomes zero.

Alternatively, the average amount of ciphertext required for anadversary to be able to uniquely determine the key, given enoughcomputing time.

For a perfectly secure cipher we have n0 = ∞.

We set sn = 0 in the following

sn ≥#K

(#P)nRL− 1

to obtain an estimate of the unicity distance n0

n0 ≈log2 #K

RL log2 #P.



Substitution Cipher

We now show why it was easy to break the substitution cipher.I #P = 26I #K = 26! ≈ 288.4

I RL = 0.75 for English

We get an estimate for the unicity distance of

n0 ≈88.4

0.75× 4.7≈ 25.

So we require on average only 25 ciphertext characters before wecan break the substitution cipher, given enough computing time.

After 25 characters we expect a unique valid decryption.



Modern Ciphers

Given a cipher which encrypts bit strings using keys of bit length m.I #P = 2I #K = 2m

I RL = 0.75 for English (underestimation since we’re using ASCII)

Then we get an estimate for the unicity distance of

n0 ≈log2 #K

RL log2 #P=

log2(2m)

0.75 log2(2)=

m0.75

=4m3

.



coms30124 : crypto and information...

Documents