coms30124 : crypto and information...
TRANSCRIPT
COMS30124 : Crypto and Information Theory
Elisabeth Oswald and Nigel Smart
Department of Computer Science,University Of Bristol,
Merchant Venturers Building,Woodland Road,Bristol, BS8 1UBUnited Kingdom.
11th October 2006
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 1
Outline
Computational Security
Recap on Probability Theory
Probability and Ciphers
Shannon’s Theorem
Entropy and Uncertainty
Entropy and Cryptography
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 2
Information Theory and Cryptography
Information Theory is one of the foundations of computer science.
Here we will examine its relationship to cryptography.
We will be followingI Chapter 4 of Smart - Cryptography, an Introduction
Other books are...I Chapter 2 of Stinson - Cryptography : Theory and PracticeI Chapter 1 of Welsh - Codes and Cryptography
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 3
Computational Security
A system is computationally secure if the best algorithm for breakingit requires N operations.
I Where N is a very big numberI No practical system can be proved secure under this definition.
In practice we say a system is computationally secure if the bestknown algorithm for breaking it requires an unreasonably largeamount of computer time.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 4
Computational Security
Another, practical, approach is to reduce a well studied hardproblem to the problem of breaking the system.
I E.g. : The system is secure if a given integer n cannot befactored.
Systems of this form are often called provably secure.I However, we only have a proof relative to some hard problem.I Not an absolute proof.
Essentially bounding the computational power of the adversary.I Even if the adversary has limited (but large) resources she still
will not break the system.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 5
Computational Security
When considering schemes which are computationally secure
I We need to be careful about the key sizes etc.I We need to keep abreast of current algorithmic developmentsI At some point in the future we should expect our system to be
broken (may be many millennia hence though).
Most schemes in use today are computationally secure.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 6
Unconditional Security
For unconditional security we place no bound on the computationalpower of the adversary.
In other words, a system is unconditionally secure if it cannot bebroken even with infinite computing power.
I Some systems are unconditionally secure.
Other names for unconditionally secure areI Perfectly secureI Information Theoretically Secure
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 7
Examples
Of the ciphers we have seen, or of those we are to see later on, thefollowing are not computationally secure
I Caesar cipherI Substitution cipherI Vigenère cipher
The following are computationally secure but not unconditionallysecure.
I DES (?) - AESI RSA
One time pad is unconditionally secure if used correctly.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 8
Probability Diversion
To study perfect security we need to look a little at probability.
A random variable X is a variable which takes certain values withcertain probabilities.
Examples:I Let X be the random variable representing tosses of a fair coin
I p(X = heads) = 1/2I p(X = tails) = 1/2
I Let X be the random variable representing letters in English text
I p(X = a) = 0.082, p(X = e) = 0.127, p(X = z) = 0.001
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 9
Probability Diversion
Let X and Y be random variables.I p(X = x) is the probability that X takes the value x .I p(Y = y) is the probability that Y takes the value y .
The joint probability is defined asI p(X = x , Y = y) is the probability that X takes the value x and
Y takes the value y .
X and Y are independent iffI p(X = x , Y = y) = p(X = x) ·p(Y = y) for all values of x and y .
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 10
Conditional Probability
The conditional probability is defined asI p(X = x | Y = y) is the probability that X takes the value x
given that Y takes the value y .
We have
p(X = x , Y = y) = p(X = x | Y = y) · p(Y = y)
p(X = x , Y = y) = p(Y = y | X = x) · p(X = x)
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 11
Bayes’ Theorem
The following is one of the most crucial statements in probability.
Bayes’ TheoremIf p(Y = y) > 0 then
p(X = x | Y = y) =p(X = x , Y = y)
p(Y = y)
=p(Y = y | X = x) · p(X = x)
p(Y = y)
X and Y are independent iff p(X = x | Y = y) = p(X = x)
I i.e. value of X does not depend on the value of Y .
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 12
Probability and Ciphers
Let P denote the set of possible plaintexts.Let K denote the set of possible keys.Let C denote the set of possible ciphertexts.Let P, K , C be associated random variables with probabilities
p(P = m) p(K = k) p(C = c).
We make the reasonable assumption that P and K are independent.
The set of ciphertexts under the key k is defined by
C(k) = {ek (m) : m ∈ P}.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 13
Probability and Ciphers
We have the relationship
p(C = c) =∑
{k :c∈C(k)}
p(K = k) · p(P = dk (c)).
For c ∈ C and m ∈ P we can compute the probabilityp(C = c | P = m).
This is the probability that c is the ciphertext given that m is theplaintext
p(C = c | P = m) =∑
{k :m=dk (c)}
p(K = k).
To break a cipher we want to know the probabilities of the plaintextgiven a certain ciphertext.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 14
Probability and Ciphers
We can compute the probability of m being the plaintext given c isthe ciphertext,
p(P = m | C = c) =p(C = c | P = m) · p(P = m)
p(C = c)
=p(P = m) ·
∑{k :m=dk (c)} p(K = k)∑
{k :c∈C(k)} p(K = k) · p(P = dk (c)).
This can be computed by anyone who knows the probabilitydistributions of K , P and the encryption function.
Using these probabilities one may be able to deduce someinformation about the plaintext once you have seen the ciphertext.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 15
ExampleSuppose we have P = {a, b}, i.e. only two possible messages
I p(P = a) = 1/4 and p(P = b) = 3/4Suppose we have K = {k1, k2, k3}, i.e. three possible keys
I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.Suppose we have C = {1, 2, 3, 4} with encryption given by
ek (m) a bk1 1 2k2 2 3k3 3 4
We can then compute
p(C = 1) = 1/8,
p(C = 2) = 7/16,
p(C = 3) = 1/4,
p(C = 4) = 3/16.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 16
Example
We can now compute the conditional probabilities
p(P = a | C = 1) = 1 p(P = b | C = 1) = 0p(P = a | C = 2) = 1/7 p(P = b | C = 2) = 6/7p(P = a | C = 3) = 1/4 p(P = b | C = 3) = 3/4p(P = a | C = 4) = 0 p(P = b | C = 4) = 1
HenceI If we see the ciphertext 1 we know the message is a.I If we see the ciphertext 4 we know the message is b.I If we see the ciphertext 3 we guess the message is b.I If we see the ciphertext 2 we guess the message is b.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 17
Example - Conclusion
So in the previous example the ciphertext does reveal informationabout the plaintext.
This is exactly what we wish to avoid.
We want the ciphertext to give no information about the plaintext.
A system with this property is said to be perfectly secure.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 18
Perfect Secrecy
A cryptosystem has perfect secrecy iff
p(P = m | C = c) = p(P = m).
for all m ∈ P and c ∈ C.
That is, the probability that the plaintext is m given that theciphertext c is observed is the same as the probability that theplaintext is m without seeing c.
In other words knowing c reveals no information about m.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 19
Perfect Secrecy
Recall: Perfect secrecy means p(P = m | C = c) = p(P = m). Thisis equivalent to
p(C = c | P = m) = p(C = c).
Assume, p(C = c) > 0 for all c ∈ C (if not remove c from C).
For any fixed m we have p(C = c | P = m) = p(C = c) > 0. Thismeans that for all c there must be at least one key k such that
ek (m) = c.
Conclusion:#K ≥ #C.
Note: We always have #C ≥ #P since encryption with a given key kis injective.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 20
Shannon’s Theorem
Shannon’s Theorem is the most important theorem in theinformation theoretic study of cryptography.
Shannon’s TheoremSuppose (P, C, K, ek (·), dk (·)) is a cryptosystem with#P = #C = #K.This cryptosystem provides perfect secrecy if and only if every key isused with equal probability 1/#K and, for each m ∈ P and c ∈ C,there is a unique key k such that ek (m) = c.
Note the statement is if and only if hence we need to prove it in bothdirections.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 21
ProofSuppose system provides perfect secrecy.
Then, for any fixed m ∈ P, we know that for all c ∈ C there is at leastone key k such that ek (m) = c.
Since #C = #K we have
#C = #{ek (m) : k ∈ K} = #K
i.e. there do not exist two keys k1 and k2 such that
ek1(m) = ek2(m) = c.
So for all m ∈ P and c ∈ C there is unique k ∈ K such thatek (m) = c.
We need to show that every key is used with equal probability, i.e.
p(K = k) = 1/#K for all k ∈ K.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 22
Proof
Let n = #K and P = {mi : 1 ≤ i ≤ n} and fix c ∈ C.
Label the keys k1, . . . , kn such that
eki (mi) = c for 1 ≤ i ≤ n.
Due to perfect secrecy we have p(P = mi | C = c) = p(P = mi) andthus
p(P = mi) = p(P = mi | C = c)
=p(C = c | P = mi) · p(P = mi)
p(C = c)
=p(K = ki) · p(P = mi)
p(C = c)
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 23
Proof
Hence we obtain that for all 1 ≤ i ≤ n,
p(C = c) = p(K = ki).
Since∑n
i=1 p(K = ki) = 1 we have
n · p(C = c) = 1 ⇒ p(C = c) = 1/n,
thus all keys are used with equal probability.
Conclusion:p(K = k) = 1/#K for all k ∈ K.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 24
Proof
Now we need to prove the result in the other direction.
Suppose that
I #K = #C = #P;I every key is used with equal probability 1/#K; andI for each m ∈ P and c ∈ C there is a unique key k with
ek (m) = c.
Then we need to show that the system is perfectly secure, i.e.
p(P = m | C = c) = p(P = m).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 25
ProofSince each key is used with equal probability, we have, for fixed c
p(C = c) =∑
{k :c∈C(k)}
p(K = k) · p(P = dk (c))
=1
#K∑
{k :c∈C(k)}
p(P = dk (c)).
Since for each m and c there is a unique key k with ek (m) = c wehave ∑
{k :c∈C(k)}
p(P = dk (c)) =∑m∈P
p(P = m) = 1.
Conclusion: p(C = c) = 1/#K.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 26
Proof
In addition, if c = ek (m) then
p(C = c | P = m) = p(K = k) = 1/#K.
Then using Bayes’ Theorem we have
p(P = m | C = c) =p(C = c | P = m) · p(P = m)
p(C = c)
=1/#K · p(P = m)
1/#K= p(P = m).
Q.E.D.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 27
Example - Shift Cipher
For the Shift Cipher we had P = K = C = Z26 and ek (m) = m + kmod 26.
Shannon’s Theorem implies perfect secrecy if we encrypt 1 letter.
Extension to plaintext of length n by using Shift Cipher with new keyfor each letter.For this system we clearly have
P = K = C = (Z26)n p(K = k) =
126n .
Furthermore, for each m and c there is a unique k such thatek (m) = c.Shannon’s Theorem then implies perfect secrecy.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 28
Example - Vernam One-Time Pad
Vernam Cipher uses Shift cipher but modulo 2 instead of modulo 26.
Binary arithmetic or XOR is defined as
⊕ 0 10 0 11 1 0
Vernam One-Time PadI Gilbert Vernam patented this cipher in 1917 for encryption and
decryption of telegraph messages.I To send a binary string you need a key as long as the message.I Each key can be used only once hence One-Time Pad.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 29
Example - Vernam One-Time Pad
Clearly we cannot use the same key twice owing to following chosenplaintext attack.
I Eve generates m and asks Alice to encrypt it.I Eve receives c = m ⊕ k from Alice.I Eve can now compute the key k = c ⊕m.I Eve can decrypt all messages encrypted with k .
One time pad is used in some military and diplomatic contexts.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 30
Key DistributionPerfect secrecy ⇒ length of key is at least length of plaintext.
I Key distribution becomes the major problem.I Solution in fourth year course COMS40213: Information
Security.
Aim of modern cryptography is to design systems whereI one key can be used many times; andI a short key can encrypt a long message.
Such systems will not be unconditionally secure, but should be atleast computationally secure.
We need to use some Information Theory to analyse the situationwhere the same key is used for multiple encryptions.Again, the main results are due to Shannon in the late 1940’s.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 31
Uncertainty
Consider the following examples.
I The outcome of a throw of two dice is more uncertain than theoutcome of the throw of one die.
I The outcome of the throw of a fair die is more uncertain thanthe outcome of the throw of a biased die.
I The uncertainty of the random variable X with p(X = 0) = pand p(X = 1) = 1− p is the same as the uncertainty of therandom variable Y with p(Y = a) = p and p(Y = d) = 1− p.
If we want to define uncertainty H(X ) of some random variable Xthen H(X ) should be a function of the probability distribution of Xonly.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 32
Uncertainty - Shannon’s Axioms
In 1948 Shannon proposed 8 requirements which a sensibledefinition of uncertainty H(X ) should satisfy.Let X be random variable with values x1, . . . , xn and probabilitiespi = p(X = xi) for i = 1, . . . , n then the most important requirementsare:
I H(p1, . . . , pn) is maximum when pi = 1/n for all i ;I H(p1, . . . , pn) ≥ 0 and equals zero only when pi = 1 for some i ;I H(1
n , . . . , 1n ) ≤ H( 1
n+1 , . . . , 1n+1) for n ∈ N (A two-horse race is
less uncertain than a three-horse race.); andI H( 1
mn , . . . , 1mn ) = H( 1
m , . . . , 1m ) + H(1
n , . . . , 1n ) for m, n ∈ N.
(Linearity condition: Uncertainty when throwing an m-sided diefollowed by an n sided die is sum of individual uncertainties.)
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 33
Entropy = Uncertainty
From the requirements on the previous slide one can prove that theonly possible definition for H(X ) is the following.
Given a random variable X that takes on a finite set of values withprobabilities p1, . . . , pn, the uncertainty or entropy is
H(X ) = −n∑
i=1
pi log2 pi .
Note that if pi = 0 we remove it from the above sum.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 34
Entropy - Examples
Let X be the throw of a fair die, i.e. p(X = i) = 1/6 for i = 1, . . . , 6,then
H(X ) = −6∑
i=1
16
log216
= − log216
= log2 6.
More general, if X takes on n values with equal probability then
H(X ) = −n∑
i=1
1n
log21n
= log2 n
Suppose X is the answer to a question with values either Yes or No.I If I always answer Yes, then there is no uncertainty, i.e.
H(X ) = 0.I If Yes and No are equally probable then H(X ) = 1.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 35
InformationConsider the following examples.
I Suppose I toss a fair coin and tell you the outcome of theexperiment, then I have given you 1 bit of information.
I Suppose I toss a fair coin n times, then the information of theoutcome of the experiment clearly is n bits.
I Suppose I answer Yes to a question with probability 0.99, andNo with probability 0.01, then the answer Yes providesconsiderably less information than the answer No, since onealready expected the answer Yes.
These examples suggest that the information I of an event E whichoccurs with probability p should be defined as
I(E) = − log2 p.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 36
Entropy and InformationLet X be a random variable that takes on the values x1, . . . , xn withpi = p(X = xi), then the information content of the event X = xi is
I(X = xi) = − log2 pi .
Recall that the entropy of a random variable X was defined as
H(X ) = −n∑
i=1
pi log2 pi .
This is the mean value of the information content of the eventsX = xi .
Therefore, entropy measures the average information content of anobservation of X .
Conclusion: Loss of entropy is gain of information !Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 37
Example
Let us return to our example cryptosystem from earlier.
The possible plaintexts, keys and ciphertexts wereI P = {a, b},I K = {k1, k2, k3},I C = {1, 2, 3, 4}.
We had the following probabilitiesI p(P = a) = 1/4 and p(P = b) = 3/4.I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.I p(C = 1) = 1/8, p(C = 2) = 7/16, p(C = 3) = 1/4 and
p(C = 4) = 3/16.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 38
Example
Then we have
H(P) =−14
log214− 3
4log2
34≈ 0.81,
H(K ) =−12
log212− 2
14
log214≈ 1.5,
H(C) =−18
log218− 7
16log2
716
− 14
log214− 3
16log2
316
≈ 1.85.
Note that the uncertainty or entropy H(C) of the ciphertext is smallerthan the sum of the entropies of the plaintext H(P) and the keyH(K ).
Later we will see that the difference is the remaining uncertaintyabout the key given the the ciphertext.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 39
A Fact About Logarithms
The following is a special case of Jensen’s inequality which we willneed to discuss entropy in more depth.
Suppose ai > 0 for i = 1, . . . , n andn∑
i=1
ai = 1.
Then, if xi > 0 for i = 1, . . . , n we have
n∑i=1
ai log2(xi) ≤ log2
(n∑
i=1
aixi
).
Furthermore, equality occurs if and only if x1 = x2 = . . . = xn.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 40
Upper Bound on Entropy
Suppose X is a random variable that takes on values x1, . . . , xn withprobability distribution pi = p(X = xi) for i = 1, . . . , n then
H(X ) = −n∑
i=1
pi log2 pi =n∑
i=1
pi log21pi
≤ log2
n∑i=1
(pi ×
1pi
)(by Jensen’s Inequality)
= log2 n.
Conclusion:For random variable X with n possible values we haveH(X ) ≤ log2 n and we obtain equality if and only if pi = 1/n for all i .
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 41
Joint EntropyLet X and Y be random variables with values x1, . . . , xn andy1, . . . , ym and joint probability
rij = p(X = xi , Y = yj)
for i = 1, . . . , n and j = 1, . . . , m.
The joint entropy is defined as
H(X , Y ) = −n∑
i=1
m∑j=1
rij log2 rij .
The joint entropy H(X , Y ) is the uncertainty of the random variablesX and Y together.
The joint entropy H(X , Y ) measures the average informationcontent of an observation of X and Y together.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 42
Joint EntropyLet X and Y be random variables then we have the inequality
H(X , Y ) ≤ H(X ) + H(Y ),
with equality if and only if X and Y are independent.
Reminder:X and Y are independent means that for all i and j
p(X = xi , Y = yj) = p(X = xi) · p(Y = yj).
Proof can be found inI Stinson - Cryptography: Theory and Practice, Theorem 2.7, p.
57 andI Welsh - Codes and Cryptography, Theorem 2, p. 6.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 43
Conditional Entropy
Conditional entropy measures the average uncertainty of a randomvariable X given an observation of a random variable Y .
Reminder: If X and Y are random variables with values x1, . . . , xn
and y1, . . . , ym then the conditional probability p(X = xi |Y = yj) isthe probability that the value of X will be xi given that the value of Yis yj .
The conditional entropy of X given Y = yj is defined as
H(X |Y = yj) = −n∑
i=1
p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 44
Conditional Entropy
The conditional entropy of X given Y is defined as the weightedaverage of the entropies H(X |Y = yj) for j = 1, . . . , m, i.e.
H(X |Y ) =∑m
j=1 p(Y = yj) · H(X |Y = yj)
= −∑m
j=1∑n
i=1 p(Y = yj) · p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).
Conditional entropy measures the average uncertainty of a randomvariable X given observations of a random variable Y , averagedover all values that Y can take.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 45
Conditional and Joint Entropy
Conditional and joint entropy are linked by the following formula
H(X , Y ) = H(Y ) + H(X |Y ).
Proof: Welsh - Codes and Cryptography, Theorem 1, p. 8.
As an immediate consequence, we have the following upper bound
H(X |Y ) ≤ H(X )
with equality if and only if X and Y are independent.
Proof: Welsh - Codes and Cryptography, Corollary, p. 9.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 46
Information and Entropy
Reminder: Loss of uncertainty is gain of information.
Let X and Y be two random variables, then the information about Xconveyed by Y is defined as
I(X |Y ) = H(X )− H(X |Y ).
Clearly I(X |Y ) = 0 if and only if X and Y are independent.
Remark:I Strangely enough we have I(X |Y ) = I(Y |X ).I Proof: Welsh - Codes and Cryptography, Proof, p. 11.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 47
Conditional Entropy and Cryptography
Let P, K, C be the set of possible messages, keys and ciphertextswith associated random variables P, K , C.
H(P|K , C) = 0I Given the ciphertext and the key, you know the plaintext since it
is the decryption of the given ciphertext under the given key.
H(C|P, K ) = 0I Given the plaintext and the key, you know the ciphertext since it
is the encryption of the given plaintext under the given key.I Note: Modern public key encryption schemes do not have this
last property when used correctly.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 48
Key Equivocation
The conditional entropy H(K |C) is called the key equivocation andmeasures the average uncertainty remaining about the key when aciphertext has been observed.
Suppose that an adversary wants to determine the key of anon-perfect cipher.The smaller H(K |C) is, the easier it will be to recover the key.
The information revealed about the key by the ciphertext is the lossof uncertainty about the key when a ciphertext has been observed,i.e.
I(K |C) = H(K )− H(K |C).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 49
Key EquivocationFor a cryptosystem (P, C, K, ek (·), dk (·)) we have
H(K |C) = H(K ) + H(P)− H(C).
In words: The remaining uncertainty about the key when a ciphertexthas been observed is equal to the sum of the uncertainties aboutthe key and the plaintext minus the uncertainty about the ciphertext.
Proof can be found inI Stinson Cryptography: theory and practice, Theorem 2.10,
p. 59.
As a consequence of the last two equations, the informationrevealed about the key by the ciphertext is equal to
I(K |C) = H(C)− H(P).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 50
Example - Key Equivocation
Returning to our example cryptosystem from earlier
H(P) ≈ 0.81, H(K ) ≈ 1.5 and H(C) ≈ 1.85.
Using the formula for H(K |C) we get
H(K |C) = H(K ) + H(P)− H(C) ≈ 1.5 + 0.81− 1.85 ≈ 0.46.
So the remaining uncertainty about the key is less than half a bit.
And the information revealed about the key by the ciphertext is
I(K |C) = H(C)− H(P) ≈ 1.85− 0.81 ≈ 1.04.
Thus the ciphertext leaks more than 1 bit of information about thekey.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 51
Spurious Keys
If you know that the plaintext is taken from a ‘natural’ language, thenknowing the ciphertext rules out a certain subset of the keys.
Of the remaining possible keys, only one is correct.The remaining possible, but incorrect, keys are called the spuriouskeys.
Consider the Shift Cipher with the same key for each letter.I Suppose the ciphertext is WNAJW.I The plaintext is known to be an English word.I The only ‘meaningful’ plaintexts are RIVER and ARENA.I We have two possible keys E and W .I One is correct and one is spurious.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 52
Natural Language
To prove a bound on the number of spurious keys, we need to definewhat we mean by the entropy per letter HL of a natural language L.
Ideally we would like HL to be defined such that the number ofmeaningful strings of length n, which we denote T (n), with n � 0 isabout
2nHL ≈ T (n).
In a natural language there are very few meaningful strings, so theentropy per letter HL will be lower than the entropy of a randomstring,
HL ≤ log2 26 ≈ 4.7.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 53
Natural LanguageWe get a better approximation if we use the probabilities with whichletters occur in English: if P is the random variable representing theletters in the English language, then
p(P = a) = 0.082, p(P = b) = 0.015, . . . , p(P = z) = 0.001.
This gives us the upper bound
HL ≤ H(P) ≈ 4.19.
However, successive letters are clearly not independent which willfurther reduce the entropy per letter.
An even better approximation is to use P2, i.e. the random variableof bigrams in English, which leads to the bound
HL ≤H(P2)
2≈ 3.90.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 54
Natural LanguageContinuing this process, we are led to the following definition.
The entropy per letter HL of a natural language L is defined as
HL = limn→∞
H(Pn)
n,
where Pn is the random variable for n-grams.
This is hard to compute exactly but we can approximate it andvarious experiments yield the empirical result
1.0 ≤ HL ≤ 1.5.
So each letter in EnglishI requires 5 = dlog2 26e bits of data to represent it, butI Huffman encoding would only use 1.5 bits per letter.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 55
RedundancyFor a language L with entropy HL and alphabet P, we need aboutn log2 #P bits to represent a string of length n.However a compact encoding only needs about nHL bits.
The redundancy RL of a language is defined as the relativedifference between both encodings, i.e.
RL =n log2 #P− nHL
n log2 #P= 1− HL
log2 #P.
If we take HL ≈ 1.25 then the redundancy of English is
RL = 1− 1.25log2 26
= 0.75.
So we can compress an English text file of 10 MB down to 2.5 MB.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 56
Spurious Keys
Let Pn and Cn be the set of n-grams of plaintext and ciphertext, withassociated random variables Pn and Cn.
Suppose we use the same key k ∈ K with associated randomvariable K to encrypt each letter, then
K (c) = {k ∈ K : ∃m ∈ Pn, p(Pn = m) > 0, ek (m) = c},
is the set of possible keys for which c is the encryption of ameaningful message of length n.
Therefore, given the ciphertext c the number of spurious keys is
#K (c)− 1,
since there is only 1 correct key.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 57
Spurious Keys
The average number of spurious keys over all possible ciphertextsof length n is denoted by sn and equals
sn =∑c∈Cn
p(Cn = c) · (#K (c)− 1)
=∑c∈Cn
p(Cn = c) ·#K (c)−∑c∈Cn
p(Cn = c)
=∑c∈Cn
p(Cn = c) ·#K (c)− 1
We will now relate sn to the key equivocation H(K |Cn).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 58
Key Equivocation and Spurious Keys
Recall that H(K |Cn) is the average of H(K |Cn = c) over all possibleciphertexts and thus
H(K |Cn) =∑c∈Cn
p(Cn = c) · H(K |Cn = c)
≤∑c∈Cn
p(Cn = c) · log2 #K (c) (most uncertain when all equally likely)
≤ log2
(∑c∈Cn
p(Cn = c) ·#K (c)
)(by Jensen’s inequality)
= log2(sn + 1). (from last slide)
Conclusion: H(K |Cn) ≤ log2(sn + 1).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 59
Key Equivocation and Spurious KeysRecall that the key equivocation H(K |Cn) could be expressed as
H(K |Cn) = H(K ) + H(Pn)− H(Cn).
For a language L with entropy HL we can use the estimate
H(Pn) ≈ nHL = n(1− RL) log2 #P,
provided that n is reasonably large.
Since the entropy is always bounded by the log2 of number of values
H(Cn) ≤ n log2 #C.
Conclusion: If #P = #C then, putting all this together, we have theinequality
H(K |Cn) ≥ H(K )− nRL log2 #P.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 60
Bound on Number of Spurious KeysCombining the result of the two previous slides, we get the bound
log2(sn + 1) ≥ H(K )− nRL log2 #P.
Theorem:Suppose that (P, C, K, ek (·), dk (·)) is a cryptosystem with #P = #Csuch that keys are chosen equiprobably. If RL is the redundancy ofthe underlying language, then given a ciphertext of length n, theexpected number of spurious keys sn satisfies
sn ≥#K
(#P)nRL− 1.
Example:For a substitution cipher we have #P = 26, #K = 26! ≈ 288.4 andtake RL = 0.75, then
sn ≥ 288.4−3.5n − 1.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 61
Unicity DistanceThe unicity distance n0 of a cryptosystem is the value of n at whichthe expected number of spurious keys becomes zero.
Alternatively, the average amount of ciphertext required for anadversary to be able to uniquely determine the key, given enoughcomputing time.
For a perfectly secure cipher we have n0 = ∞.
We set sn = 0 in the following
sn ≥#K
(#P)nRL− 1
to obtain an estimate of the unicity distance n0
n0 ≈log2 #K
RL log2 #P.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 62
Substitution Cipher
We now show why it was easy to break the substitution cipher.I #P = 26I #K = 26! ≈ 288.4
I RL = 0.75 for English
We get an estimate for the unicity distance of
n0 ≈88.4
0.75× 4.7≈ 25.
So we require on average only 25 ciphertext characters before wecan break the substitution cipher, given enough computing time.
After 25 characters we expect a unique valid decryption.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 63
Modern Ciphers
Given a cipher which encrypts bit strings using keys of bit length m.I #P = 2I #K = 2m
I RL = 0.75 for English (underestimation since we’re using ASCII)
Then we get an estimate for the unicity distance of
n0 ≈log2 #K
RL log2 #P=
log2(2m)
0.75 log2(2)=
m0.75
=4m3
.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 64