shannon's theory
TRANSCRIPT
-
8/7/2019 Shannon's Theory
1/15
Shannons Theory
Claude Shannon, one of the greatest scientists of the 20th century was a key
figure in the development of information science. He is the creator of modern information
theory, and an early and important contributor to the theory of computing.
1. SECRECY SYSTEMS
As a first step in the mathematical analysis of cryptography, it is necessary toidealize the situation suitably, and to define in a mathematically acceptable way what we
shall mean by a secrecy system. A schematic diagram of a general secrecy system is
shown in Fig. 1. At the transmitting end there are two information sourcesa messagesource and a key source. The key source produces a particular key from among those
which are possible in the system. This key is transmitted by some means, supposedly not
interceptible, for example by messenger, to the receiving end. The message sourceproduces a message (the clear) which is enciphered and the resulting cryptogram sent
to the receiving end by a possibly interceptible means, for example radio. At thereceiving end the cryptogram and key are combined in the decipherer to recover the
message.
Fig. 1. Schematic of a general secrecy system
Evidently the encipherer performs a functional operation. IfM is the message, Kthe key, and E the enciphered message, or cryptogram, we have
E = f(M,K)
1
-
8/7/2019 Shannon's Theory
2/15
that is E is function ofM and K. It is preferable to think of this, however, not as afunction of two variables but as a (one parameter) family of operations or
transformations, and to write itE = TiM.
The transformation Ti applied to message M produces cryptogram E. The index i
corresponds to the particular key being used.We will assume, in general, that there are only a finite number of possible keys, and thateach has an associated probability pi. Thus the key source is represented by a statisticalprocess or device which chooses one from the set of transformations T1 ,T2, , Tm withthe respective probabilities p1, p2, ,pm. Similarly we will generally assume a finitenumber of possible messages M1, M2, ,Mn with associate a priori probabilitiesq1, q2, ,qn.The possible messages, for example, might be the possible sequences of English letters
all of length N, and the associated probabilities are then the relative frequencies ofoccurrence of these sequences in normal English text.
At the receiving end it must be possible to recoverM, knowing E and K. Thus the
transformations Ti in the family must have unique inverses Ti-1
such that TiTi-1
=I theidentity transformation. Thus:
M = Ti-1E.
At any rate this inverse must exist uniquely for every E which can be obtainedfrom an M with key i. Hence we arrive at the definition: A secrecy system is a family ofuniquely reversible transformations Ti of a set of possible messages into a set ofcryptograms, the transformation Ti having an associated probability pi. Conversely anyset of entities of this type will be called a secrecy system. The set of possible messages
will be called, for convenience, the message space and the set of possible cryptograms
the cryptogram space.
Two secrecy systems will be the same if they consist of the same set oftransformations Ti, with the same messages and cryptogram space (range and domain)and the same probabilities for the keys.
A secrecy system can be visualized mechanically as a machine with one or morecontrols on it. A sequence of letters, the message, is fed into the input of the machine and
a second series emerges at the output. The particular setting of the controls corresponds
to the particular key being used. Some statistical method must be prescribed for choosingthe key from all the possible ones.
REPRESENTATION OF SYSTEMS
A secrecy system as defined above can be represented in various ways. Onewhich is convenient for illustrative purposes is a line diagram. The possible messages arerepresented by points at the left and the possible cryptograms by points at the right. If a
certain key, say key 1, transforms message M2 into cryptogram E4 then M2 and E4 areconnected by a line labeled 1, etc. From each possible message there must be exactly oneline emerging for each different key. If the same is true for each cryptogram, we will say
that the system is closed.
2
-
8/7/2019 Shannon's Theory
3/15
A more common way of describing a system is by stating the operation one
performs on the message for an arbitrary key to obtain the cryptogram. Similarly, one
defines implicitly the probabilities for various keys by describing how a key is chosen orwhat we know of the enemys habits of key choice. The probabilities for messages are
implicitly determined by stating oura priori knowledge of the enemys language habits,
the tactical situation (which will influence the probable content of the message) and anyspecial information we may have regarding the cryptogram.
CLOSED SYSTEM NOT CLOSEDFig. 2. Line drawings for simple systems
EXAMPLES OF SECRECY SYSTEMS
Simple Substitution Cipher
In this cipher each letter of the message is replaced by a fixed substitute, usually
also a letter. Thus the message,M = m1m2m3m4
where m1, m2, are the successive letters becomes:E = e1e2e3e4 = f(m1)f(m2)f(m3)f(m4)
where the function f(m) is a function with an inverse. The key is a permutation of thealphabet (when the substitutes are letters) e.g. X G U A C D TB F H R S L M Q V YZ W I E J O K N P. The first letterX is the substitute forA, G is the substitute forB,
etc.
Transposition (Fixed Periodd)
The message is divided into groups of length d and a permutation applied to thefirst group, the same permutation to the second group, etc. The permutation is the key and
can be represented by a permutation of the first d integers. Thus ford = 5, we mighthave 2 3 1 5 4 as the permutation. This means that:
3
-
8/7/2019 Shannon's Theory
4/15
m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 becomes
m2 m3 m1 m5 m4 m7 m8 m6 m10 m9 .Sequential application of two or more transpositions will be called compound
transposition. If the periods are d1, d2, ,dn it is clear that the result is a transposition
of period d, where d is the least common multiple ofd1, d2, ,dn.
Vigenere, and Variations
In the Vigenere cipher the key consists of a series ofd letters. These are writtenrepeatedly below the message and the two added modulo 26 (considering the alphabetnumbered from A = 0 to Z = 25. Thus
ei = mi + ki (mod26)where ki is of period d in the index i. For example, with the key G A H, we obtain
message N O W I S T H Erepeated key G A H G A H G A
cryptogram T O D O S A N E
The Vigenere of period 1 is called the Caesar cipher. It is a simple substitution in whicheach letter ofM is advanced a fixed amount in the alphabet. This amount is the key,which may be any number from 0 to 25. The so-called Beaufort and Variant Beaufortare similar to the Vigen_ere, and encipher by the equations
ei = kimi (mod26)ei = mi ki (mod26)
respectively. The Beaufort of period one is called the reversed Caesar cipher. Theapplication of two or more Vigenere in sequence will be called the compound Vigenere.
It has the equation
ei = mi + ki + li + + si (mod26)
where ki, li, ,si in general have different periods. The period of their sum,ki + li + + si
as in compound transposition, is the least common multiple of the individual periods.
Digram, Trigram, andN-gram substitution
Rather than substitute for letters one can substitute for digrams, trigrams, etc.General digram substitution requires a key consisting of a permutation of the 262
digrams. It can be represented by a table in which the row corresponds to the first letter of
the digram and the column to the second letter, entries in the table being the substitutions(usually also digrams).
VALUATIONS OFSECRECYSYSTEM
There are a number of different criteria that should be applied in estimating the value of aproposed secrecy system. The most important of these are:
4
-
8/7/2019 Shannon's Theory
5/15
Amount of Secrecy
There are some systems that are perfectthe enemy is no better off after
intercepting any amount of material than before. Other systems, although giving himsome information, do not yield a unique solution to intercepted cryptograms. Among
the uniquely solvable systems, there are wide variations in the amount of labor required
to effect this solution and in the amount of material that must be intercepted to make thesolution unique.
Size of Key
The key must be transmitted by non-interceptible means from transmitting to
receiving points. Sometimes it must be memorized. It is therefore desirable to have the
key as small as possible.
Complexity of Enciphering and Deciphering Operations
Enciphering and deciphering should, of course, be as simple as possible. If theyare done manually, complexity leads to loss of time, errors, etc. If done mechanically,
complexity leads to large expensive machines.
Propagation of Errors
In certain types of ciphers an error of one letter in enciphering or transmissionleads to a large number of errors in the deciphered text. The error are spread out by the
deciphering operation, causing the loss of much information and frequent need for
repetition of the cryptogram. It is naturally desirable to minimize this error expansion.
Expansion of Message
In some types of secrecy systems the size of the message is increased by the
enciphering process. This undesirable effect may be seen in systems where one attempts
to swamp out message statistics by the addition of many nulls, or where multiplesubstitutes are used. It also occurs in many concealment types of systems (which are
not usually secrecy systems in the sense of our definition).
PERFECT SECRECY
Let us suppose the possible messages are finite in numberM1, ,Mn and havea priori probabilities P(M1), ,P(Mn), and that these are enciphered into the possiblecryptograms E1, ,Em by
E = TiM.The cryptanalyst intercepts a particularE and can then calculate, in principle at
least, the a posteriori probabilities for the various messages, PE(M). It is natural to defineperfect secrecy by the condition that, for all E the aposteriori probabilities are equal tothe a priori probabilities independently of the values of these. In this case, intercepting
the message has given the cryptanalyst no information.Any action of his which depends
5
-
8/7/2019 Shannon's Theory
6/15
on the information contained in the cryptogram cannot be altered, for all of his
probabilities as to what the cryptogram contains remain unchanged. On the other hand, if
the condition is notsatisfied there will exist situations in which the enemy has certain a
priori probabilities, and certain key and message choices may occur for which the
enemys probabilities do change. This in turn may affect his actions and thus perfect
secrecy has not been obtained. Hence the definition given is necessarily required by ourintuitive ideas of what perfect secrecy should mean.
A necessary and sufficient condition for perfect secrecy can be found as follows:We have
by Bayes theorem
)(
)()()(
EP
MPMPMP E
E =
in which:
P(M) = a priori probability of message M.PM(E) = conditional probability of cryptogram E if message M is chosen i.e. the sum
of the probabilities of all keys which produce cryptogram E from message M.
P(E) = probability of obtaining cryptogram E from any cause.PE(M) = a posteriori probability of messageM if cryptogram E is intercepted.
For perfect secrecy PE(M) must equal P(M) for all E and allM. Hence either P(M) = 0,a solution that must be excluded since we demand the equality independent of the valuesofP(M), or
PM(E) = P(E)
for every M and E. Conversely ifPM(E) = P(E) thenPE(M) = P(M)and we have perfect secrecy. Thus we have the result:
Theorem . A necessary and sufficient condition for perfect secrecy is thatPM(E) = P(E)
for allM andE. That is, PM(E) must be independent ofM.
Stated another way, the total probability of all keys that transform Mi into a givencryptogram E is equal to that of all keys transforming Mj into the same E, for all Mi;Mjand E.
Now there must be as many Es as there are Ms since, for a fixed i, Ti gives aone-to-one correspondence between all theMs and some of the Es.For perfect secrecyPM(E) = P(E) 6= 0 for any of these Es and any M.Hence there is at least one keytransforming any M into any of these Es.But all the keys from a fixed M to different Es
must be different, andtherefore the number of different keys is at least as great as thenumber of Ms. It is possible to obtain perfect secrecy with only this number of keys, as
6
-
8/7/2019 Shannon's Theory
7/15
Fig. 3. Perfect system
one shows by the following example: Let the Mi be numbered 1 to n and the Ei the same,and using n keys let
TiMj = Es
where s = i + j (Mod n). In this case we see that )(1
)( EPn
MPE ==
and we have perfect secrecy. An example is shown in Fig. 3 with s = i+j - 1 (Mod 5).Perfect systems in which the number of cryptograms, the number of messages, and the
number of keys are all equal are characterized by the properties that (1) each M isconnected to each E by exactly one line, (2) all keys are equally likely. Thus the matrix
representation of the system is a Latin square.In MTC it was shown that information may be conveniently measured by
means of entropy. If we have a set of possibilities with probabilities p1, p2, ,pn, theentropy H is given by:
= .log ii ppH
In a secrecy system there are two statistical choices involved, that of the message and of
the key. We may measure the amount of information produced when a message is chosen
by H(M):
= ),(log)()( MPMPMH
the summation being over all possible messages. Similarly, there is an uncertaintyassociated with the choice of key given by:
= ),(log)()( KPKPKH
In perfect systems of the type described above, the amount of information in the
message is at most log n (occurring when all messages are equiprobable). Thisinformation can be concealed completely only if the key uncertainty is at least log n. Thisis the first example of a general principle which will appear frequently: that there is a
7
-
8/7/2019 Shannon's Theory
8/15
limit to what we can obtain with a given uncertainty in keythe amount of uncertainty
we can introduce into the solution cannot be greater than the key uncertainty.
The situation is somewhat more complicated if the number of messages is infinite.Suppose, for example, that they are generated as infinite sequences of letters by a suitable
Markoff process. It is clear that no finite key will give perfect secrecy. We suppose, then,
that the key source generates key in the same manner, that is, as an infinite sequence ofsymbols. Suppose further that only a certain length of key LKis needed to encipher anddecipher a length LM of message. Let the logarithm of the number of letters in themessage alphabet be RM and that for the key alphabet be RK. Then, from the finite case, itis evident that perfect secrecy requires
RMLM RKLK.
This type of perfect secrecy is realized by the Vernam system.
These results have been deduced on the basis of unknown or arbitrary a priori
probabilities of the messages. The key required for perfect secrecy depends then on the
total number of possible messages.One would expect that, if the message space has fixed known statistics, so that it has a
definite mean rate R of generating information, in the sense of MTC, then the amount of
key needed could be reduced on the average in just this ratioMR
R, and this is indeed
true. In fact the message can be passed through a transducer which eliminates the
redundancy and reduces the expected length in just this ratio, and then a Vernam system
may be applied to the result. Evidently the amount of key used per letter of message is
statistically reduced by a factorMR
Rand in this case the key source and information
source are just matcheda bit of key completely conceals a bit of message information.
It is easily shown also, by the methods used in MTC, that this is the best that can be done.Perfect secrecy systems have a place in the practical picturethey may be used
either where the greatest importance is attached to complete secrecy e.g.,
correspondence between the highest levels of command, or in cases where the number ofpossible messages is small. Thus, to take an extreme example, if only two messages
yes or no were anticipated, a perfect system would be in order, with perhaps the
transformation table:
The disadvantage of perfect systems for large correspondence systems is, ofcourse, the equivalent amount of key that must be sent. In succeeding sections we
consider what can be achieved with smaller key size, in particular with finite keys.
2. ENTROPY
8
-
8/7/2019 Shannon's Theory
9/15
The Shannon entropy orinformation entropy is a measure of the uncertainty
associated with a random variable. It quantifies the information contained in a message,usually in bits or bits/symbol. It is the minimum message length necessary to
communicate information.
This also represents an absolute limit on the best possible lossless compression ofany communication: treating a message as a series of symbols, the shortest possible
representation to transmit the message is the Shannon entropy in bits/symbol multiplied
by the number of symbols in the original message.
Definition: The information entropy of a discrete random variableX, that can take on
possible values {x1...xn} is
where
I(X) is the information content orself-informationofX, which is itself a randomvariable; and
p(xi) = Pr(X=xi) is the probability mass functionofX; and
0log0 is taken to be 0.
Characterization
Information entropy is characterisedby these desiderata:
Define and .
Continuity
The measure should be continuous i.e., changing the value of one of the
probabilities by a very small amount should only change the entropy by a small amount.
Symmetry
The measure should be unchanged if the outcomes xi are re-ordered.
etc.Maximum
The measure should be maximal if all the outcomes are equally likely (uncertaintyis highest when all possible events are equiprobable).
9
http://encyclopedia.thefreedictionary.com/Random+variablehttp://encyclopedia.thefreedictionary.com/Random+variablehttp://encyclopedia.thefreedictionary.com/Data+compressionhttp://encyclopedia.thefreedictionary.com/Discrete+probability+distributionhttp://encyclopedia.thefreedictionary.com/Discrete+probability+distributionhttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/Characterization+(mathematics)http://encyclopedia.thefreedictionary.com/Continuous+functionhttp://encyclopedia.thefreedictionary.com/Random+variablehttp://encyclopedia.thefreedictionary.com/Data+compressionhttp://encyclopedia.thefreedictionary.com/Discrete+probability+distributionhttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/Characterization+(mathematics)http://encyclopedia.thefreedictionary.com/Continuous+function -
8/7/2019 Shannon's Theory
10/15
For equiprobable events the entropy should increase with the number of
outcomes.
Additivity
The amount of entropy should be independent of how the process is regarded asbeing divided into parts.
This last functional relationship characterizes the entropy of a system with sub-
systems. It demands that the entropy of a system can be calculated from the entropy of itssub-systems if we know how the sub-systems interact with each other.
Given an ensemble of n uniformly distributed elements that are divided into k
boxes (sub-systems) with b1, b2, , bk elements, the entropy of the whole ensemble
should be equal to the sum of the entropy of the system of boxes and the individualentropies of the boxes, each weighted with the probability of being in that particular box.
Forpositive integersbi where b1 + + bk = n,
Choosing k= n, b1 = = bn = 1 this implies that the entropy of a certain outcome
is zero:
It can be shown that any definition of entropy satisfying these assumptions has the form
where Kis a constant corresponding to a choice of measurement units.
Information entropy explained
For a random variable with outcomes , the Shannon
information entropy, a measure of uncertainty (see further below) and denoted by
, is defined as
(1)
where is the probability mass functionof outcome , and is the base of the
logarithm used. Common values of are 2, , and 10. The unit of the information entropy
isbitfor , nat for , dit(or digit) for .
10
http://encyclopedia.thefreedictionary.com/Natural+numberhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/BIThttp://encyclopedia.thefreedictionary.com/BIThttp://encyclopedia.thefreedictionary.com/Ban+(information)http://encyclopedia.thefreedictionary.com/Natural+numberhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/BIThttp://encyclopedia.thefreedictionary.com/Ban+(information) -
8/7/2019 Shannon's Theory
11/15
To understand the meaning of Eq.(1), let's first consider a set of possible outcomes
(events) , with equal probability . An example
would be a fairdie with values, from to . The uncertainty for such set of outcomesis defined by
(2)
The logarithm is used so to provide the additivity characteristic for independent
uncertainty. For example, consider appending to each value of the first die the value of a
second die, which has possible outcomes . There are thus
possible outcomes . The uncertainty for such
set of outcomes is then
(3)
Thus the uncertainty of playing with two dice is obtained by adding the uncertainty of the
second die to the uncertainty of the first die .
Now return to the case of playing with one die only (the first one); since the probability
of each event is 1 / n, we can write
In the case of a non-uniform probability mass function (or distribution in the case ofcontinuous random variable), we let
(4)
which is also called asurprisal; the lower the probability , i.e. , thehigher the uncertainty or the surprise, i.e. , for the outcome
The average uncertainty , with being the average operator, is obtained by
(5)
and is used as the definition of the information entropy in Eq.(1). The above also
explained why information entropy and information uncertainty can be usedinterchangeably.
11
http://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Self-information -
8/7/2019 Shannon's Theory
12/15
Example
As an example, consider a fair coin. The probability of a head or a tail is 0.5. SoI(head) = I(tail) = -log(0.5) = 1. H = 1 * 0.5 + 1 * 0.5 = 1. So the messages each contain
one bit and the average information per message is one bit. This is what we would expect,
since each coin toss generates a single bit of information.Now consider a biased coin, p(head) = 2/3, p(tail) = 1/3. We have I(head) =
-log(2.3) = 0.58. I(tail) = -log(1/3) = 1.58. Note: To find the log (base 2) of a number if
you have a standard calculator, find log base 10 and then divide this by log 2 (base 10).The entropy for this system is then: H = 0.58 * 2/3 + 1.58 *1/3 = 0.92. This is telling us
that each message (head or tail) is carrying only .92 bits of information. The reason is that
the bias means we could have expected to see more heads than tails, so when this
happens we are not seeing anything unexpected. Perfect information only happens whenwe are told something we couldn't have made any useful attempt to predict.
The entropy of a system is important because it tells us how much we can hope to
compress streams of messages in the system. In principle, we could hope to get the datato fit into a system with entropy 1, by finding a perfect compression technique. In
practice, we will usually not achieve better than about 99% efficiency.Shannon calculated that English text has an entropy of about 2.3 bits per
character. Modern analysis has suggested that actually it is closer to 1.1-1.6 bits per
character, depending on the kind of text.
Further properties
The Shannon entropy satisfies the following properties:
Adding or removing an event with probability zero does not contribute to the
entropy:
.
It can be confirmed using the Jensen inequality that
This maximal entropy of log2(n) is effectively attained by a source alphabet
having a uniform probability distribution: uncertainty is maximal when all possible
events are equiprobable.
Theorem: SupposeXis a random variable having probability distribution p1, p2, pn,wherepi > 0, 1 i n. Then H(X) log2n, with equality if and only if pi = 1/n, 1 i n.
12
http://encyclopedia.thefreedictionary.com/Jensen's+inequalityhttp://encyclopedia.thefreedictionary.com/Jensen's+inequality -
8/7/2019 Shannon's Theory
13/15
PROOF :
Applying Jensens Inequality, we have the following:
n
pp
pp
ppxH
i
n
i
i
n
i i
i
n
iii
2
1
2
1
2
12
log
)1
(log1
log
log)(
=
=
=
==
=
Further, equality occurs if and only ifpi = 1/n, 1 i n.
3. Product Cryptosystems
Another innovation introduced by Shannon in his 1949 paper was the idea of combining
cryptosystems by forming their product. This idea has been of fundamental importancein the design of present-day cryptosystems such as the Data Encryption Standard, which
we study in the next chapter.
For simplicity, we will confine our attention in this section to cryptosystems in which:PC= cryptosystems of this type are called endomorphic. Suppose
),,( 1111 DKPPS = and ),,(2 222 DKPPS = are two endomorphic cryptosystems
which have the same plaintext (and ciphertext) spaces. Then the product ofS1 and S2,
denoted by S1 S2, is defined to be the cryptosystem).,,,,( 21 DKKPP
A key of the product cryptosystem has the form K= (K1, K2), where 11 KK and
22 KK .The encryption and decryption rules of the product cryptosystem are defined as
follows: For each K= (K1, K2), we have an encryption rule eK defined by the formula
)),(()( 1221 ),( xeexe kkkk =
and a decryption rule defined by the formula
)).(()(221 1),(yddyd kkkk =
13
-
8/7/2019 Shannon's Theory
14/15
That is, we first encrypt x with1k
e , and then re-encrypt the resulting ciphertext with
2ke . Decrypting is similar, but it must be done in the reverse order:
.
))(())))((((
)))((())((
1
1
1),(),(),(
1
221
2212121
X
XedXeedd
xeedxed
kk
kkkk
kkkkkkkk
=
==
=
Recall also that cryptosystems have probability distributions associated with theirkeyspaces. Thus we need to define the probability distribution for the keyspace Kof the
product cryptosystem. We do this in a very natural way:
).()(),( 2121 21 kpkpkkp kkk =
In other words, choose K1 using the distribution 1kp , and then independently choose K2
using the distribution2k
p .
Figure 4. Multiplicative Cipher
Suppose we define the Multiplicative Cipher as in Figure 4
Suppose M is the Multiplicative Cipher (with keys chosen equiprobably) and S is the
Shift Cipher (with keys chosen equiprobably). Then it is very easy to see that M S is
nothing more than the Affine Cipher (again, with keys chosen equiprobably). It is
slightly more difficult to show that S M is also the Affine Cipher with equiprobablekeys.
Lets prove these assertions. A key in the Shift Cipher is an element 26Zk , and the
corresponding encryption rule is eK(x) = x + Kmod 26. A key in the Multiplicative
Cipher is an element 26Za ,such that gcd(a, 26) = 1; the corresponding encryption rule
is ea(x) = ax mod 26. Hence, a key in the product cipherM S has the form (a, K), where26mod)(),( kaxxe ka +=
14
-
8/7/2019 Shannon's Theory
15/15
But this is precisely the definition of a key in the Affine Cipher. Further, the
probability of a key in the Affine Cipher is 1/312 = 1/12 1/26, which is the product of
the probabilities of the keys a and K, respectively. Thus M S is the Affine Cipher.Now lets considerS M. A key in this cipher has the form (K, a), where
26mod)()(),( akaxkxaxe ka +=+=
Thus the key (K, a) of the product cipherS M is identical to the key (a, aK) of theAffine Cipher. It remains to show that each key of the Affine Cipher arises with the
same probability 1/312 in the product cipherS M. Observe that aK= K1 if and only ifK= a-1K1 (recall that gcd(a, 26) = 1, so a has a multiplicative inverse). In other words, the
key (a, K1) of the Affine Cipher is equivalent to the key (a-1K1, a) of the product cipher
S M. We thus have a bijection between the two key spaces. Since each key isequiprobable, we conclude that S M is indeed the Affine Cipher.
We have shown that M S = S M. Thus we would say that the two
cryptosystems commute. But not all pairs of cryptosystems commute; it is easy to findcounterexamples. On the other hand, the product operation is always associative: (S1
S2) S3 = S1 (S2 S3).
If we take the product of an (endomorphic) cryptosystem S with itself, we obtainthe cryptosystem S S, which we denote by S2. If we take the n-fold product, the
resulting cryptosystem is denoted by Sn. We call Sn an iteratedcryptosystem.
A cryptosystem S is defined to be idempotentifS2 = S. Many of the
cryptosystems we studied in Chapter 1 are idempotent. For example, the Shift,
Substitution, Affine, Hill, Vigenere and Permutation Ciphers are all idempotent. Of
course, if a cryptosystem S is idempotent, then there is no point in using the product
system S2, as it requires an extra key but provides no more security.If a cryptosystem is not idempotent, then there is a potential increase in security
by iterating several times. This idea is used in the Data Encryption Standard, which
consists of 16 iterations. But, of course, this approach requires a non-idempotent
cryptosystem to start with. One way in which simple non-idempotent cryptosystems cansometimes be constructed is to take the product of two different (simple) cryptosystems.
BIBLIOGRAPHY:
C. E. Shannon : Communication Theory of Secrecy Systems,
Douglas Stinson : Theory and Practice,http://encyclopedia.thefreedictionary.com
15