coms30124 : crypto and information...

64
COMS30124 : Crypto and Information Theory Elisabeth Oswald and Nigel Smart Department of Computer Science, University Of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB United Kingdom. 11th October 2006 Elisabeth Oswald and Nigel Smart COMS30124 : Crypto and Information Theory Slide 1

Upload: lamcong

Post on 23-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

COMS30124 : Crypto and Information Theory

Elisabeth Oswald and Nigel Smart

Department of Computer Science,University Of Bristol,

Merchant Venturers Building,Woodland Road,Bristol, BS8 1UBUnited Kingdom.

11th October 2006

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 1

Page 2: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Outline

Computational Security

Recap on Probability Theory

Probability and Ciphers

Shannon’s Theorem

Entropy and Uncertainty

Entropy and Cryptography

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 2

Page 3: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Information Theory and Cryptography

Information Theory is one of the foundations of computer science.

Here we will examine its relationship to cryptography.

We will be followingI Chapter 4 of Smart - Cryptography, an Introduction

Other books are...I Chapter 2 of Stinson - Cryptography : Theory and PracticeI Chapter 1 of Welsh - Codes and Cryptography

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 3

Page 4: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Computational Security

A system is computationally secure if the best algorithm for breakingit requires N operations.

I Where N is a very big numberI No practical system can be proved secure under this definition.

In practice we say a system is computationally secure if the bestknown algorithm for breaking it requires an unreasonably largeamount of computer time.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 4

Page 5: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Computational Security

Another, practical, approach is to reduce a well studied hardproblem to the problem of breaking the system.

I E.g. : The system is secure if a given integer n cannot befactored.

Systems of this form are often called provably secure.I However, we only have a proof relative to some hard problem.I Not an absolute proof.

Essentially bounding the computational power of the adversary.I Even if the adversary has limited (but large) resources she still

will not break the system.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 5

Page 6: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Computational Security

When considering schemes which are computationally secure

I We need to be careful about the key sizes etc.I We need to keep abreast of current algorithmic developmentsI At some point in the future we should expect our system to be

broken (may be many millennia hence though).

Most schemes in use today are computationally secure.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 6

Page 7: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Unconditional Security

For unconditional security we place no bound on the computationalpower of the adversary.

In other words, a system is unconditionally secure if it cannot bebroken even with infinite computing power.

I Some systems are unconditionally secure.

Other names for unconditionally secure areI Perfectly secureI Information Theoretically Secure

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 7

Page 8: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Examples

Of the ciphers we have seen, or of those we are to see later on, thefollowing are not computationally secure

I Caesar cipherI Substitution cipherI Vigenère cipher

The following are computationally secure but not unconditionallysecure.

I DES (?) - AESI RSA

One time pad is unconditionally secure if used correctly.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 8

Page 9: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Probability Diversion

To study perfect security we need to look a little at probability.

A random variable X is a variable which takes certain values withcertain probabilities.

Examples:I Let X be the random variable representing tosses of a fair coin

I p(X = heads) = 1/2I p(X = tails) = 1/2

I Let X be the random variable representing letters in English text

I p(X = a) = 0.082, p(X = e) = 0.127, p(X = z) = 0.001

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 9

Page 10: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Probability Diversion

Let X and Y be random variables.I p(X = x) is the probability that X takes the value x .I p(Y = y) is the probability that Y takes the value y .

The joint probability is defined asI p(X = x , Y = y) is the probability that X takes the value x and

Y takes the value y .

X and Y are independent iffI p(X = x , Y = y) = p(X = x) ·p(Y = y) for all values of x and y .

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 10

Page 11: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Conditional Probability

The conditional probability is defined asI p(X = x | Y = y) is the probability that X takes the value x

given that Y takes the value y .

We have

p(X = x , Y = y) = p(X = x | Y = y) · p(Y = y)

p(X = x , Y = y) = p(Y = y | X = x) · p(X = x)

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 11

Page 12: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Bayes’ Theorem

The following is one of the most crucial statements in probability.

Bayes’ TheoremIf p(Y = y) > 0 then

p(X = x | Y = y) =p(X = x , Y = y)

p(Y = y)

=p(Y = y | X = x) · p(X = x)

p(Y = y)

X and Y are independent iff p(X = x | Y = y) = p(X = x)

I i.e. value of X does not depend on the value of Y .

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 12

Page 13: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Probability and Ciphers

Let P denote the set of possible plaintexts.Let K denote the set of possible keys.Let C denote the set of possible ciphertexts.Let P, K , C be associated random variables with probabilities

p(P = m) p(K = k) p(C = c).

We make the reasonable assumption that P and K are independent.

The set of ciphertexts under the key k is defined by

C(k) = {ek (m) : m ∈ P}.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 13

Page 14: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Probability and Ciphers

We have the relationship

p(C = c) =∑

{k :c∈C(k)}

p(K = k) · p(P = dk (c)).

For c ∈ C and m ∈ P we can compute the probabilityp(C = c | P = m).

This is the probability that c is the ciphertext given that m is theplaintext

p(C = c | P = m) =∑

{k :m=dk (c)}

p(K = k).

To break a cipher we want to know the probabilities of the plaintextgiven a certain ciphertext.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 14

Page 15: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Probability and Ciphers

We can compute the probability of m being the plaintext given c isthe ciphertext,

p(P = m | C = c) =p(C = c | P = m) · p(P = m)

p(C = c)

=p(P = m) ·

∑{k :m=dk (c)} p(K = k)∑

{k :c∈C(k)} p(K = k) · p(P = dk (c)).

This can be computed by anyone who knows the probabilitydistributions of K , P and the encryption function.

Using these probabilities one may be able to deduce someinformation about the plaintext once you have seen the ciphertext.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 15

Page 16: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

ExampleSuppose we have P = {a, b}, i.e. only two possible messages

I p(P = a) = 1/4 and p(P = b) = 3/4Suppose we have K = {k1, k2, k3}, i.e. three possible keys

I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.Suppose we have C = {1, 2, 3, 4} with encryption given by

ek (m) a bk1 1 2k2 2 3k3 3 4

We can then compute

p(C = 1) = 1/8,

p(C = 2) = 7/16,

p(C = 3) = 1/4,

p(C = 4) = 3/16.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 16

Page 17: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example

We can now compute the conditional probabilities

p(P = a | C = 1) = 1 p(P = b | C = 1) = 0p(P = a | C = 2) = 1/7 p(P = b | C = 2) = 6/7p(P = a | C = 3) = 1/4 p(P = b | C = 3) = 3/4p(P = a | C = 4) = 0 p(P = b | C = 4) = 1

HenceI If we see the ciphertext 1 we know the message is a.I If we see the ciphertext 4 we know the message is b.I If we see the ciphertext 3 we guess the message is b.I If we see the ciphertext 2 we guess the message is b.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 17

Page 18: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example - Conclusion

So in the previous example the ciphertext does reveal informationabout the plaintext.

This is exactly what we wish to avoid.

We want the ciphertext to give no information about the plaintext.

A system with this property is said to be perfectly secure.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 18

Page 19: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Perfect Secrecy

A cryptosystem has perfect secrecy iff

p(P = m | C = c) = p(P = m).

for all m ∈ P and c ∈ C.

That is, the probability that the plaintext is m given that theciphertext c is observed is the same as the probability that theplaintext is m without seeing c.

In other words knowing c reveals no information about m.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 19

Page 20: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Perfect Secrecy

Recall: Perfect secrecy means p(P = m | C = c) = p(P = m). Thisis equivalent to

p(C = c | P = m) = p(C = c).

Assume, p(C = c) > 0 for all c ∈ C (if not remove c from C).

For any fixed m we have p(C = c | P = m) = p(C = c) > 0. Thismeans that for all c there must be at least one key k such that

ek (m) = c.

Conclusion:#K ≥ #C.

Note: We always have #C ≥ #P since encryption with a given key kis injective.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 20

Page 21: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Shannon’s Theorem

Shannon’s Theorem is the most important theorem in theinformation theoretic study of cryptography.

Shannon’s TheoremSuppose (P, C, K, ek (·), dk (·)) is a cryptosystem with#P = #C = #K.This cryptosystem provides perfect secrecy if and only if every key isused with equal probability 1/#K and, for each m ∈ P and c ∈ C,there is a unique key k such that ek (m) = c.

Note the statement is if and only if hence we need to prove it in bothdirections.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 21

Page 22: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

ProofSuppose system provides perfect secrecy.

Then, for any fixed m ∈ P, we know that for all c ∈ C there is at leastone key k such that ek (m) = c.

Since #C = #K we have

#C = #{ek (m) : k ∈ K} = #K

i.e. there do not exist two keys k1 and k2 such that

ek1(m) = ek2(m) = c.

So for all m ∈ P and c ∈ C there is unique k ∈ K such thatek (m) = c.

We need to show that every key is used with equal probability, i.e.

p(K = k) = 1/#K for all k ∈ K.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 22

Page 23: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Proof

Let n = #K and P = {mi : 1 ≤ i ≤ n} and fix c ∈ C.

Label the keys k1, . . . , kn such that

eki (mi) = c for 1 ≤ i ≤ n.

Due to perfect secrecy we have p(P = mi | C = c) = p(P = mi) andthus

p(P = mi) = p(P = mi | C = c)

=p(C = c | P = mi) · p(P = mi)

p(C = c)

=p(K = ki) · p(P = mi)

p(C = c)

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 23

Page 24: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Proof

Hence we obtain that for all 1 ≤ i ≤ n,

p(C = c) = p(K = ki).

Since∑n

i=1 p(K = ki) = 1 we have

n · p(C = c) = 1 ⇒ p(C = c) = 1/n,

thus all keys are used with equal probability.

Conclusion:p(K = k) = 1/#K for all k ∈ K.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 24

Page 25: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Proof

Now we need to prove the result in the other direction.

Suppose that

I #K = #C = #P;I every key is used with equal probability 1/#K; andI for each m ∈ P and c ∈ C there is a unique key k with

ek (m) = c.

Then we need to show that the system is perfectly secure, i.e.

p(P = m | C = c) = p(P = m).

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 25

Page 26: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

ProofSince each key is used with equal probability, we have, for fixed c

p(C = c) =∑

{k :c∈C(k)}

p(K = k) · p(P = dk (c))

=1

#K∑

{k :c∈C(k)}

p(P = dk (c)).

Since for each m and c there is a unique key k with ek (m) = c wehave ∑

{k :c∈C(k)}

p(P = dk (c)) =∑m∈P

p(P = m) = 1.

Conclusion: p(C = c) = 1/#K.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 26

Page 27: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Proof

In addition, if c = ek (m) then

p(C = c | P = m) = p(K = k) = 1/#K.

Then using Bayes’ Theorem we have

p(P = m | C = c) =p(C = c | P = m) · p(P = m)

p(C = c)

=1/#K · p(P = m)

1/#K= p(P = m).

Q.E.D.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 27

Page 28: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example - Shift Cipher

For the Shift Cipher we had P = K = C = Z26 and ek (m) = m + kmod 26.

Shannon’s Theorem implies perfect secrecy if we encrypt 1 letter.

Extension to plaintext of length n by using Shift Cipher with new keyfor each letter.For this system we clearly have

P = K = C = (Z26)n p(K = k) =

126n .

Furthermore, for each m and c there is a unique k such thatek (m) = c.Shannon’s Theorem then implies perfect secrecy.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 28

Page 29: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example - Vernam One-Time Pad

Vernam Cipher uses Shift cipher but modulo 2 instead of modulo 26.

Binary arithmetic or XOR is defined as

⊕ 0 10 0 11 1 0

Vernam One-Time PadI Gilbert Vernam patented this cipher in 1917 for encryption and

decryption of telegraph messages.I To send a binary string you need a key as long as the message.I Each key can be used only once hence One-Time Pad.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 29

Page 30: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example - Vernam One-Time Pad

Clearly we cannot use the same key twice owing to following chosenplaintext attack.

I Eve generates m and asks Alice to encrypt it.I Eve receives c = m ⊕ k from Alice.I Eve can now compute the key k = c ⊕m.I Eve can decrypt all messages encrypted with k .

One time pad is used in some military and diplomatic contexts.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 30

Page 31: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Key DistributionPerfect secrecy ⇒ length of key is at least length of plaintext.

I Key distribution becomes the major problem.I Solution in fourth year course COMS40213: Information

Security.

Aim of modern cryptography is to design systems whereI one key can be used many times; andI a short key can encrypt a long message.

Such systems will not be unconditionally secure, but should be atleast computationally secure.

We need to use some Information Theory to analyse the situationwhere the same key is used for multiple encryptions.Again, the main results are due to Shannon in the late 1940’s.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 31

Page 32: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Uncertainty

Consider the following examples.

I The outcome of a throw of two dice is more uncertain than theoutcome of the throw of one die.

I The outcome of the throw of a fair die is more uncertain thanthe outcome of the throw of a biased die.

I The uncertainty of the random variable X with p(X = 0) = pand p(X = 1) = 1− p is the same as the uncertainty of therandom variable Y with p(Y = a) = p and p(Y = d) = 1− p.

If we want to define uncertainty H(X ) of some random variable Xthen H(X ) should be a function of the probability distribution of Xonly.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 32

Page 33: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Uncertainty - Shannon’s Axioms

In 1948 Shannon proposed 8 requirements which a sensibledefinition of uncertainty H(X ) should satisfy.Let X be random variable with values x1, . . . , xn and probabilitiespi = p(X = xi) for i = 1, . . . , n then the most important requirementsare:

I H(p1, . . . , pn) is maximum when pi = 1/n for all i ;I H(p1, . . . , pn) ≥ 0 and equals zero only when pi = 1 for some i ;I H(1

n , . . . , 1n ) ≤ H( 1

n+1 , . . . , 1n+1) for n ∈ N (A two-horse race is

less uncertain than a three-horse race.); andI H( 1

mn , . . . , 1mn ) = H( 1

m , . . . , 1m ) + H(1

n , . . . , 1n ) for m, n ∈ N.

(Linearity condition: Uncertainty when throwing an m-sided diefollowed by an n sided die is sum of individual uncertainties.)

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 33

Page 34: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Entropy = Uncertainty

From the requirements on the previous slide one can prove that theonly possible definition for H(X ) is the following.

Given a random variable X that takes on a finite set of values withprobabilities p1, . . . , pn, the uncertainty or entropy is

H(X ) = −n∑

i=1

pi log2 pi .

Note that if pi = 0 we remove it from the above sum.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 34

Page 35: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Entropy - Examples

Let X be the throw of a fair die, i.e. p(X = i) = 1/6 for i = 1, . . . , 6,then

H(X ) = −6∑

i=1

16

log216

= − log216

= log2 6.

More general, if X takes on n values with equal probability then

H(X ) = −n∑

i=1

1n

log21n

= log2 n

Suppose X is the answer to a question with values either Yes or No.I If I always answer Yes, then there is no uncertainty, i.e.

H(X ) = 0.I If Yes and No are equally probable then H(X ) = 1.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 35

Page 36: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

InformationConsider the following examples.

I Suppose I toss a fair coin and tell you the outcome of theexperiment, then I have given you 1 bit of information.

I Suppose I toss a fair coin n times, then the information of theoutcome of the experiment clearly is n bits.

I Suppose I answer Yes to a question with probability 0.99, andNo with probability 0.01, then the answer Yes providesconsiderably less information than the answer No, since onealready expected the answer Yes.

These examples suggest that the information I of an event E whichoccurs with probability p should be defined as

I(E) = − log2 p.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 36

Page 37: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Entropy and InformationLet X be a random variable that takes on the values x1, . . . , xn withpi = p(X = xi), then the information content of the event X = xi is

I(X = xi) = − log2 pi .

Recall that the entropy of a random variable X was defined as

H(X ) = −n∑

i=1

pi log2 pi .

This is the mean value of the information content of the eventsX = xi .

Therefore, entropy measures the average information content of anobservation of X .

Conclusion: Loss of entropy is gain of information !Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 37

Page 38: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example

Let us return to our example cryptosystem from earlier.

The possible plaintexts, keys and ciphertexts wereI P = {a, b},I K = {k1, k2, k3},I C = {1, 2, 3, 4}.

We had the following probabilitiesI p(P = a) = 1/4 and p(P = b) = 3/4.I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.I p(C = 1) = 1/8, p(C = 2) = 7/16, p(C = 3) = 1/4 and

p(C = 4) = 3/16.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 38

Page 39: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example

Then we have

H(P) =−14

log214− 3

4log2

34≈ 0.81,

H(K ) =−12

log212− 2

14

log214≈ 1.5,

H(C) =−18

log218− 7

16log2

716

− 14

log214− 3

16log2

316

≈ 1.85.

Note that the uncertainty or entropy H(C) of the ciphertext is smallerthan the sum of the entropies of the plaintext H(P) and the keyH(K ).

Later we will see that the difference is the remaining uncertaintyabout the key given the the ciphertext.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 39

Page 40: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

A Fact About Logarithms

The following is a special case of Jensen’s inequality which we willneed to discuss entropy in more depth.

Suppose ai > 0 for i = 1, . . . , n andn∑

i=1

ai = 1.

Then, if xi > 0 for i = 1, . . . , n we have

n∑i=1

ai log2(xi) ≤ log2

(n∑

i=1

aixi

).

Furthermore, equality occurs if and only if x1 = x2 = . . . = xn.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 40

Page 41: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Upper Bound on Entropy

Suppose X is a random variable that takes on values x1, . . . , xn withprobability distribution pi = p(X = xi) for i = 1, . . . , n then

H(X ) = −n∑

i=1

pi log2 pi =n∑

i=1

pi log21pi

≤ log2

n∑i=1

(pi ×

1pi

)(by Jensen’s Inequality)

= log2 n.

Conclusion:For random variable X with n possible values we haveH(X ) ≤ log2 n and we obtain equality if and only if pi = 1/n for all i .

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 41

Page 42: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Joint EntropyLet X and Y be random variables with values x1, . . . , xn andy1, . . . , ym and joint probability

rij = p(X = xi , Y = yj)

for i = 1, . . . , n and j = 1, . . . , m.

The joint entropy is defined as

H(X , Y ) = −n∑

i=1

m∑j=1

rij log2 rij .

The joint entropy H(X , Y ) is the uncertainty of the random variablesX and Y together.

The joint entropy H(X , Y ) measures the average informationcontent of an observation of X and Y together.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 42

Page 43: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Joint EntropyLet X and Y be random variables then we have the inequality

H(X , Y ) ≤ H(X ) + H(Y ),

with equality if and only if X and Y are independent.

Reminder:X and Y are independent means that for all i and j

p(X = xi , Y = yj) = p(X = xi) · p(Y = yj).

Proof can be found inI Stinson - Cryptography: Theory and Practice, Theorem 2.7, p.

57 andI Welsh - Codes and Cryptography, Theorem 2, p. 6.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 43

Page 44: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Conditional Entropy

Conditional entropy measures the average uncertainty of a randomvariable X given an observation of a random variable Y .

Reminder: If X and Y are random variables with values x1, . . . , xn

and y1, . . . , ym then the conditional probability p(X = xi |Y = yj) isthe probability that the value of X will be xi given that the value of Yis yj .

The conditional entropy of X given Y = yj is defined as

H(X |Y = yj) = −n∑

i=1

p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 44

Page 45: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Conditional Entropy

The conditional entropy of X given Y is defined as the weightedaverage of the entropies H(X |Y = yj) for j = 1, . . . , m, i.e.

H(X |Y ) =∑m

j=1 p(Y = yj) · H(X |Y = yj)

= −∑m

j=1∑n

i=1 p(Y = yj) · p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).

Conditional entropy measures the average uncertainty of a randomvariable X given observations of a random variable Y , averagedover all values that Y can take.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 45

Page 46: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Conditional and Joint Entropy

Conditional and joint entropy are linked by the following formula

H(X , Y ) = H(Y ) + H(X |Y ).

Proof: Welsh - Codes and Cryptography, Theorem 1, p. 8.

As an immediate consequence, we have the following upper bound

H(X |Y ) ≤ H(X )

with equality if and only if X and Y are independent.

Proof: Welsh - Codes and Cryptography, Corollary, p. 9.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 46

Page 47: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Information and Entropy

Reminder: Loss of uncertainty is gain of information.

Let X and Y be two random variables, then the information about Xconveyed by Y is defined as

I(X |Y ) = H(X )− H(X |Y ).

Clearly I(X |Y ) = 0 if and only if X and Y are independent.

Remark:I Strangely enough we have I(X |Y ) = I(Y |X ).I Proof: Welsh - Codes and Cryptography, Proof, p. 11.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 47

Page 48: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Conditional Entropy and Cryptography

Let P, K, C be the set of possible messages, keys and ciphertextswith associated random variables P, K , C.

H(P|K , C) = 0I Given the ciphertext and the key, you know the plaintext since it

is the decryption of the given ciphertext under the given key.

H(C|P, K ) = 0I Given the plaintext and the key, you know the ciphertext since it

is the encryption of the given plaintext under the given key.I Note: Modern public key encryption schemes do not have this

last property when used correctly.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 48

Page 49: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Key Equivocation

The conditional entropy H(K |C) is called the key equivocation andmeasures the average uncertainty remaining about the key when aciphertext has been observed.

Suppose that an adversary wants to determine the key of anon-perfect cipher.The smaller H(K |C) is, the easier it will be to recover the key.

The information revealed about the key by the ciphertext is the lossof uncertainty about the key when a ciphertext has been observed,i.e.

I(K |C) = H(K )− H(K |C).

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 49

Page 50: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Key EquivocationFor a cryptosystem (P, C, K, ek (·), dk (·)) we have

H(K |C) = H(K ) + H(P)− H(C).

In words: The remaining uncertainty about the key when a ciphertexthas been observed is equal to the sum of the uncertainties aboutthe key and the plaintext minus the uncertainty about the ciphertext.

Proof can be found inI Stinson Cryptography: theory and practice, Theorem 2.10,

p. 59.

As a consequence of the last two equations, the informationrevealed about the key by the ciphertext is equal to

I(K |C) = H(C)− H(P).

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 50

Page 51: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Example - Key Equivocation

Returning to our example cryptosystem from earlier

H(P) ≈ 0.81, H(K ) ≈ 1.5 and H(C) ≈ 1.85.

Using the formula for H(K |C) we get

H(K |C) = H(K ) + H(P)− H(C) ≈ 1.5 + 0.81− 1.85 ≈ 0.46.

So the remaining uncertainty about the key is less than half a bit.

And the information revealed about the key by the ciphertext is

I(K |C) = H(C)− H(P) ≈ 1.85− 0.81 ≈ 1.04.

Thus the ciphertext leaks more than 1 bit of information about thekey.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 51

Page 52: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Spurious Keys

If you know that the plaintext is taken from a ‘natural’ language, thenknowing the ciphertext rules out a certain subset of the keys.

Of the remaining possible keys, only one is correct.The remaining possible, but incorrect, keys are called the spuriouskeys.

Consider the Shift Cipher with the same key for each letter.I Suppose the ciphertext is WNAJW.I The plaintext is known to be an English word.I The only ‘meaningful’ plaintexts are RIVER and ARENA.I We have two possible keys E and W .I One is correct and one is spurious.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 52

Page 53: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Natural Language

To prove a bound on the number of spurious keys, we need to definewhat we mean by the entropy per letter HL of a natural language L.

Ideally we would like HL to be defined such that the number ofmeaningful strings of length n, which we denote T (n), with n � 0 isabout

2nHL ≈ T (n).

In a natural language there are very few meaningful strings, so theentropy per letter HL will be lower than the entropy of a randomstring,

HL ≤ log2 26 ≈ 4.7.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 53

Page 54: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Natural LanguageWe get a better approximation if we use the probabilities with whichletters occur in English: if P is the random variable representing theletters in the English language, then

p(P = a) = 0.082, p(P = b) = 0.015, . . . , p(P = z) = 0.001.

This gives us the upper bound

HL ≤ H(P) ≈ 4.19.

However, successive letters are clearly not independent which willfurther reduce the entropy per letter.

An even better approximation is to use P2, i.e. the random variableof bigrams in English, which leads to the bound

HL ≤H(P2)

2≈ 3.90.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 54

Page 55: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Natural LanguageContinuing this process, we are led to the following definition.

The entropy per letter HL of a natural language L is defined as

HL = limn→∞

H(Pn)

n,

where Pn is the random variable for n-grams.

This is hard to compute exactly but we can approximate it andvarious experiments yield the empirical result

1.0 ≤ HL ≤ 1.5.

So each letter in EnglishI requires 5 = dlog2 26e bits of data to represent it, butI Huffman encoding would only use 1.5 bits per letter.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 55

Page 56: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

RedundancyFor a language L with entropy HL and alphabet P, we need aboutn log2 #P bits to represent a string of length n.However a compact encoding only needs about nHL bits.

The redundancy RL of a language is defined as the relativedifference between both encodings, i.e.

RL =n log2 #P− nHL

n log2 #P= 1− HL

log2 #P.

If we take HL ≈ 1.25 then the redundancy of English is

RL = 1− 1.25log2 26

= 0.75.

So we can compress an English text file of 10 MB down to 2.5 MB.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 56

Page 57: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Spurious Keys

Let Pn and Cn be the set of n-grams of plaintext and ciphertext, withassociated random variables Pn and Cn.

Suppose we use the same key k ∈ K with associated randomvariable K to encrypt each letter, then

K (c) = {k ∈ K : ∃m ∈ Pn, p(Pn = m) > 0, ek (m) = c},

is the set of possible keys for which c is the encryption of ameaningful message of length n.

Therefore, given the ciphertext c the number of spurious keys is

#K (c)− 1,

since there is only 1 correct key.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 57

Page 58: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Spurious Keys

The average number of spurious keys over all possible ciphertextsof length n is denoted by sn and equals

sn =∑c∈Cn

p(Cn = c) · (#K (c)− 1)

=∑c∈Cn

p(Cn = c) ·#K (c)−∑c∈Cn

p(Cn = c)

=∑c∈Cn

p(Cn = c) ·#K (c)− 1

We will now relate sn to the key equivocation H(K |Cn).

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 58

Page 59: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Key Equivocation and Spurious Keys

Recall that H(K |Cn) is the average of H(K |Cn = c) over all possibleciphertexts and thus

H(K |Cn) =∑c∈Cn

p(Cn = c) · H(K |Cn = c)

≤∑c∈Cn

p(Cn = c) · log2 #K (c) (most uncertain when all equally likely)

≤ log2

(∑c∈Cn

p(Cn = c) ·#K (c)

)(by Jensen’s inequality)

= log2(sn + 1). (from last slide)

Conclusion: H(K |Cn) ≤ log2(sn + 1).

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 59

Page 60: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Key Equivocation and Spurious KeysRecall that the key equivocation H(K |Cn) could be expressed as

H(K |Cn) = H(K ) + H(Pn)− H(Cn).

For a language L with entropy HL we can use the estimate

H(Pn) ≈ nHL = n(1− RL) log2 #P,

provided that n is reasonably large.

Since the entropy is always bounded by the log2 of number of values

H(Cn) ≤ n log2 #C.

Conclusion: If #P = #C then, putting all this together, we have theinequality

H(K |Cn) ≥ H(K )− nRL log2 #P.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 60

Page 61: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Bound on Number of Spurious KeysCombining the result of the two previous slides, we get the bound

log2(sn + 1) ≥ H(K )− nRL log2 #P.

Theorem:Suppose that (P, C, K, ek (·), dk (·)) is a cryptosystem with #P = #Csuch that keys are chosen equiprobably. If RL is the redundancy ofthe underlying language, then given a ciphertext of length n, theexpected number of spurious keys sn satisfies

sn ≥#K

(#P)nRL− 1.

Example:For a substitution cipher we have #P = 26, #K = 26! ≈ 288.4 andtake RL = 0.75, then

sn ≥ 288.4−3.5n − 1.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 61

Page 62: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Unicity DistanceThe unicity distance n0 of a cryptosystem is the value of n at whichthe expected number of spurious keys becomes zero.

Alternatively, the average amount of ciphertext required for anadversary to be able to uniquely determine the key, given enoughcomputing time.

For a perfectly secure cipher we have n0 = ∞.

We set sn = 0 in the following

sn ≥#K

(#P)nRL− 1

to obtain an estimate of the unicity distance n0

n0 ≈log2 #K

RL log2 #P.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 62

Page 63: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Substitution Cipher

We now show why it was easy to break the substitution cipher.I #P = 26I #K = 26! ≈ 288.4

I RL = 0.75 for English

We get an estimate for the unicity distance of

n0 ≈88.4

0.75× 4.7≈ 25.

So we require on average only 25 ciphertext characters before wecan break the substitution cipher, given enough computing time.

After 25 characters we expect a unique valid decryption.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 63

Page 64: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally

Modern Ciphers

Given a cipher which encrypts bit strings using keys of bit length m.I #P = 2I #K = 2m

I RL = 0.75 for English (underestimation since we’re using ASCII)

Then we get an estimate for the unicity distance of

n0 ≈log2 #K

RL log2 #P=

log2(2m)

0.75 log2(2)=

m0.75

=4m3

.

Elisabeth Oswald and Nigel Smart

COMS30124 : Crypto and Information Theory Slide 64