shannon's theory

Upload: cata-catax

Post on 08-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Shannon's Theory

    1/15

    Shannons Theory

    Claude Shannon, one of the greatest scientists of the 20th century was a key

    figure in the development of information science. He is the creator of modern information

    theory, and an early and important contributor to the theory of computing.

    1. SECRECY SYSTEMS

    As a first step in the mathematical analysis of cryptography, it is necessary toidealize the situation suitably, and to define in a mathematically acceptable way what we

    shall mean by a secrecy system. A schematic diagram of a general secrecy system is

    shown in Fig. 1. At the transmitting end there are two information sourcesa messagesource and a key source. The key source produces a particular key from among those

    which are possible in the system. This key is transmitted by some means, supposedly not

    interceptible, for example by messenger, to the receiving end. The message sourceproduces a message (the clear) which is enciphered and the resulting cryptogram sent

    to the receiving end by a possibly interceptible means, for example radio. At thereceiving end the cryptogram and key are combined in the decipherer to recover the

    message.

    Fig. 1. Schematic of a general secrecy system

    Evidently the encipherer performs a functional operation. IfM is the message, Kthe key, and E the enciphered message, or cryptogram, we have

    E = f(M,K)

    1

  • 8/7/2019 Shannon's Theory

    2/15

    that is E is function ofM and K. It is preferable to think of this, however, not as afunction of two variables but as a (one parameter) family of operations or

    transformations, and to write itE = TiM.

    The transformation Ti applied to message M produces cryptogram E. The index i

    corresponds to the particular key being used.We will assume, in general, that there are only a finite number of possible keys, and thateach has an associated probability pi. Thus the key source is represented by a statisticalprocess or device which chooses one from the set of transformations T1 ,T2, , Tm withthe respective probabilities p1, p2, ,pm. Similarly we will generally assume a finitenumber of possible messages M1, M2, ,Mn with associate a priori probabilitiesq1, q2, ,qn.The possible messages, for example, might be the possible sequences of English letters

    all of length N, and the associated probabilities are then the relative frequencies ofoccurrence of these sequences in normal English text.

    At the receiving end it must be possible to recoverM, knowing E and K. Thus the

    transformations Ti in the family must have unique inverses Ti-1

    such that TiTi-1

    =I theidentity transformation. Thus:

    M = Ti-1E.

    At any rate this inverse must exist uniquely for every E which can be obtainedfrom an M with key i. Hence we arrive at the definition: A secrecy system is a family ofuniquely reversible transformations Ti of a set of possible messages into a set ofcryptograms, the transformation Ti having an associated probability pi. Conversely anyset of entities of this type will be called a secrecy system. The set of possible messages

    will be called, for convenience, the message space and the set of possible cryptograms

    the cryptogram space.

    Two secrecy systems will be the same if they consist of the same set oftransformations Ti, with the same messages and cryptogram space (range and domain)and the same probabilities for the keys.

    A secrecy system can be visualized mechanically as a machine with one or morecontrols on it. A sequence of letters, the message, is fed into the input of the machine and

    a second series emerges at the output. The particular setting of the controls corresponds

    to the particular key being used. Some statistical method must be prescribed for choosingthe key from all the possible ones.

    REPRESENTATION OF SYSTEMS

    A secrecy system as defined above can be represented in various ways. Onewhich is convenient for illustrative purposes is a line diagram. The possible messages arerepresented by points at the left and the possible cryptograms by points at the right. If a

    certain key, say key 1, transforms message M2 into cryptogram E4 then M2 and E4 areconnected by a line labeled 1, etc. From each possible message there must be exactly oneline emerging for each different key. If the same is true for each cryptogram, we will say

    that the system is closed.

    2

  • 8/7/2019 Shannon's Theory

    3/15

    A more common way of describing a system is by stating the operation one

    performs on the message for an arbitrary key to obtain the cryptogram. Similarly, one

    defines implicitly the probabilities for various keys by describing how a key is chosen orwhat we know of the enemys habits of key choice. The probabilities for messages are

    implicitly determined by stating oura priori knowledge of the enemys language habits,

    the tactical situation (which will influence the probable content of the message) and anyspecial information we may have regarding the cryptogram.

    CLOSED SYSTEM NOT CLOSEDFig. 2. Line drawings for simple systems

    EXAMPLES OF SECRECY SYSTEMS

    Simple Substitution Cipher

    In this cipher each letter of the message is replaced by a fixed substitute, usually

    also a letter. Thus the message,M = m1m2m3m4

    where m1, m2, are the successive letters becomes:E = e1e2e3e4 = f(m1)f(m2)f(m3)f(m4)

    where the function f(m) is a function with an inverse. The key is a permutation of thealphabet (when the substitutes are letters) e.g. X G U A C D TB F H R S L M Q V YZ W I E J O K N P. The first letterX is the substitute forA, G is the substitute forB,

    etc.

    Transposition (Fixed Periodd)

    The message is divided into groups of length d and a permutation applied to thefirst group, the same permutation to the second group, etc. The permutation is the key and

    can be represented by a permutation of the first d integers. Thus ford = 5, we mighthave 2 3 1 5 4 as the permutation. This means that:

    3

  • 8/7/2019 Shannon's Theory

    4/15

    m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 becomes

    m2 m3 m1 m5 m4 m7 m8 m6 m10 m9 .Sequential application of two or more transpositions will be called compound

    transposition. If the periods are d1, d2, ,dn it is clear that the result is a transposition

    of period d, where d is the least common multiple ofd1, d2, ,dn.

    Vigenere, and Variations

    In the Vigenere cipher the key consists of a series ofd letters. These are writtenrepeatedly below the message and the two added modulo 26 (considering the alphabetnumbered from A = 0 to Z = 25. Thus

    ei = mi + ki (mod26)where ki is of period d in the index i. For example, with the key G A H, we obtain

    message N O W I S T H Erepeated key G A H G A H G A

    cryptogram T O D O S A N E

    The Vigenere of period 1 is called the Caesar cipher. It is a simple substitution in whicheach letter ofM is advanced a fixed amount in the alphabet. This amount is the key,which may be any number from 0 to 25. The so-called Beaufort and Variant Beaufortare similar to the Vigen_ere, and encipher by the equations

    ei = kimi (mod26)ei = mi ki (mod26)

    respectively. The Beaufort of period one is called the reversed Caesar cipher. Theapplication of two or more Vigenere in sequence will be called the compound Vigenere.

    It has the equation

    ei = mi + ki + li + + si (mod26)

    where ki, li, ,si in general have different periods. The period of their sum,ki + li + + si

    as in compound transposition, is the least common multiple of the individual periods.

    Digram, Trigram, andN-gram substitution

    Rather than substitute for letters one can substitute for digrams, trigrams, etc.General digram substitution requires a key consisting of a permutation of the 262

    digrams. It can be represented by a table in which the row corresponds to the first letter of

    the digram and the column to the second letter, entries in the table being the substitutions(usually also digrams).

    VALUATIONS OFSECRECYSYSTEM

    There are a number of different criteria that should be applied in estimating the value of aproposed secrecy system. The most important of these are:

    4

  • 8/7/2019 Shannon's Theory

    5/15

    Amount of Secrecy

    There are some systems that are perfectthe enemy is no better off after

    intercepting any amount of material than before. Other systems, although giving himsome information, do not yield a unique solution to intercepted cryptograms. Among

    the uniquely solvable systems, there are wide variations in the amount of labor required

    to effect this solution and in the amount of material that must be intercepted to make thesolution unique.

    Size of Key

    The key must be transmitted by non-interceptible means from transmitting to

    receiving points. Sometimes it must be memorized. It is therefore desirable to have the

    key as small as possible.

    Complexity of Enciphering and Deciphering Operations

    Enciphering and deciphering should, of course, be as simple as possible. If theyare done manually, complexity leads to loss of time, errors, etc. If done mechanically,

    complexity leads to large expensive machines.

    Propagation of Errors

    In certain types of ciphers an error of one letter in enciphering or transmissionleads to a large number of errors in the deciphered text. The error are spread out by the

    deciphering operation, causing the loss of much information and frequent need for

    repetition of the cryptogram. It is naturally desirable to minimize this error expansion.

    Expansion of Message

    In some types of secrecy systems the size of the message is increased by the

    enciphering process. This undesirable effect may be seen in systems where one attempts

    to swamp out message statistics by the addition of many nulls, or where multiplesubstitutes are used. It also occurs in many concealment types of systems (which are

    not usually secrecy systems in the sense of our definition).

    PERFECT SECRECY

    Let us suppose the possible messages are finite in numberM1, ,Mn and havea priori probabilities P(M1), ,P(Mn), and that these are enciphered into the possiblecryptograms E1, ,Em by

    E = TiM.The cryptanalyst intercepts a particularE and can then calculate, in principle at

    least, the a posteriori probabilities for the various messages, PE(M). It is natural to defineperfect secrecy by the condition that, for all E the aposteriori probabilities are equal tothe a priori probabilities independently of the values of these. In this case, intercepting

    the message has given the cryptanalyst no information.Any action of his which depends

    5

  • 8/7/2019 Shannon's Theory

    6/15

    on the information contained in the cryptogram cannot be altered, for all of his

    probabilities as to what the cryptogram contains remain unchanged. On the other hand, if

    the condition is notsatisfied there will exist situations in which the enemy has certain a

    priori probabilities, and certain key and message choices may occur for which the

    enemys probabilities do change. This in turn may affect his actions and thus perfect

    secrecy has not been obtained. Hence the definition given is necessarily required by ourintuitive ideas of what perfect secrecy should mean.

    A necessary and sufficient condition for perfect secrecy can be found as follows:We have

    by Bayes theorem

    )(

    )()()(

    EP

    MPMPMP E

    E =

    in which:

    P(M) = a priori probability of message M.PM(E) = conditional probability of cryptogram E if message M is chosen i.e. the sum

    of the probabilities of all keys which produce cryptogram E from message M.

    P(E) = probability of obtaining cryptogram E from any cause.PE(M) = a posteriori probability of messageM if cryptogram E is intercepted.

    For perfect secrecy PE(M) must equal P(M) for all E and allM. Hence either P(M) = 0,a solution that must be excluded since we demand the equality independent of the valuesofP(M), or

    PM(E) = P(E)

    for every M and E. Conversely ifPM(E) = P(E) thenPE(M) = P(M)and we have perfect secrecy. Thus we have the result:

    Theorem . A necessary and sufficient condition for perfect secrecy is thatPM(E) = P(E)

    for allM andE. That is, PM(E) must be independent ofM.

    Stated another way, the total probability of all keys that transform Mi into a givencryptogram E is equal to that of all keys transforming Mj into the same E, for all Mi;Mjand E.

    Now there must be as many Es as there are Ms since, for a fixed i, Ti gives aone-to-one correspondence between all theMs and some of the Es.For perfect secrecyPM(E) = P(E) 6= 0 for any of these Es and any M.Hence there is at least one keytransforming any M into any of these Es.But all the keys from a fixed M to different Es

    must be different, andtherefore the number of different keys is at least as great as thenumber of Ms. It is possible to obtain perfect secrecy with only this number of keys, as

    6

  • 8/7/2019 Shannon's Theory

    7/15

    Fig. 3. Perfect system

    one shows by the following example: Let the Mi be numbered 1 to n and the Ei the same,and using n keys let

    TiMj = Es

    where s = i + j (Mod n). In this case we see that )(1

    )( EPn

    MPE ==

    and we have perfect secrecy. An example is shown in Fig. 3 with s = i+j - 1 (Mod 5).Perfect systems in which the number of cryptograms, the number of messages, and the

    number of keys are all equal are characterized by the properties that (1) each M isconnected to each E by exactly one line, (2) all keys are equally likely. Thus the matrix

    representation of the system is a Latin square.In MTC it was shown that information may be conveniently measured by

    means of entropy. If we have a set of possibilities with probabilities p1, p2, ,pn, theentropy H is given by:

    = .log ii ppH

    In a secrecy system there are two statistical choices involved, that of the message and of

    the key. We may measure the amount of information produced when a message is chosen

    by H(M):

    = ),(log)()( MPMPMH

    the summation being over all possible messages. Similarly, there is an uncertaintyassociated with the choice of key given by:

    = ),(log)()( KPKPKH

    In perfect systems of the type described above, the amount of information in the

    message is at most log n (occurring when all messages are equiprobable). Thisinformation can be concealed completely only if the key uncertainty is at least log n. Thisis the first example of a general principle which will appear frequently: that there is a

    7

  • 8/7/2019 Shannon's Theory

    8/15

    limit to what we can obtain with a given uncertainty in keythe amount of uncertainty

    we can introduce into the solution cannot be greater than the key uncertainty.

    The situation is somewhat more complicated if the number of messages is infinite.Suppose, for example, that they are generated as infinite sequences of letters by a suitable

    Markoff process. It is clear that no finite key will give perfect secrecy. We suppose, then,

    that the key source generates key in the same manner, that is, as an infinite sequence ofsymbols. Suppose further that only a certain length of key LKis needed to encipher anddecipher a length LM of message. Let the logarithm of the number of letters in themessage alphabet be RM and that for the key alphabet be RK. Then, from the finite case, itis evident that perfect secrecy requires

    RMLM RKLK.

    This type of perfect secrecy is realized by the Vernam system.

    These results have been deduced on the basis of unknown or arbitrary a priori

    probabilities of the messages. The key required for perfect secrecy depends then on the

    total number of possible messages.One would expect that, if the message space has fixed known statistics, so that it has a

    definite mean rate R of generating information, in the sense of MTC, then the amount of

    key needed could be reduced on the average in just this ratioMR

    R, and this is indeed

    true. In fact the message can be passed through a transducer which eliminates the

    redundancy and reduces the expected length in just this ratio, and then a Vernam system

    may be applied to the result. Evidently the amount of key used per letter of message is

    statistically reduced by a factorMR

    Rand in this case the key source and information

    source are just matcheda bit of key completely conceals a bit of message information.

    It is easily shown also, by the methods used in MTC, that this is the best that can be done.Perfect secrecy systems have a place in the practical picturethey may be used

    either where the greatest importance is attached to complete secrecy e.g.,

    correspondence between the highest levels of command, or in cases where the number ofpossible messages is small. Thus, to take an extreme example, if only two messages

    yes or no were anticipated, a perfect system would be in order, with perhaps the

    transformation table:

    The disadvantage of perfect systems for large correspondence systems is, ofcourse, the equivalent amount of key that must be sent. In succeeding sections we

    consider what can be achieved with smaller key size, in particular with finite keys.

    2. ENTROPY

    8

  • 8/7/2019 Shannon's Theory

    9/15

    The Shannon entropy orinformation entropy is a measure of the uncertainty

    associated with a random variable. It quantifies the information contained in a message,usually in bits or bits/symbol. It is the minimum message length necessary to

    communicate information.

    This also represents an absolute limit on the best possible lossless compression ofany communication: treating a message as a series of symbols, the shortest possible

    representation to transmit the message is the Shannon entropy in bits/symbol multiplied

    by the number of symbols in the original message.

    Definition: The information entropy of a discrete random variableX, that can take on

    possible values {x1...xn} is

    where

    I(X) is the information content orself-informationofX, which is itself a randomvariable; and

    p(xi) = Pr(X=xi) is the probability mass functionofX; and

    0log0 is taken to be 0.

    Characterization

    Information entropy is characterisedby these desiderata:

    Define and .

    Continuity

    The measure should be continuous i.e., changing the value of one of the

    probabilities by a very small amount should only change the entropy by a small amount.

    Symmetry

    The measure should be unchanged if the outcomes xi are re-ordered.

    etc.Maximum

    The measure should be maximal if all the outcomes are equally likely (uncertaintyis highest when all possible events are equiprobable).

    9

    http://encyclopedia.thefreedictionary.com/Random+variablehttp://encyclopedia.thefreedictionary.com/Random+variablehttp://encyclopedia.thefreedictionary.com/Data+compressionhttp://encyclopedia.thefreedictionary.com/Discrete+probability+distributionhttp://encyclopedia.thefreedictionary.com/Discrete+probability+distributionhttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/Characterization+(mathematics)http://encyclopedia.thefreedictionary.com/Continuous+functionhttp://encyclopedia.thefreedictionary.com/Random+variablehttp://encyclopedia.thefreedictionary.com/Data+compressionhttp://encyclopedia.thefreedictionary.com/Discrete+probability+distributionhttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/Characterization+(mathematics)http://encyclopedia.thefreedictionary.com/Continuous+function
  • 8/7/2019 Shannon's Theory

    10/15

    For equiprobable events the entropy should increase with the number of

    outcomes.

    Additivity

    The amount of entropy should be independent of how the process is regarded asbeing divided into parts.

    This last functional relationship characterizes the entropy of a system with sub-

    systems. It demands that the entropy of a system can be calculated from the entropy of itssub-systems if we know how the sub-systems interact with each other.

    Given an ensemble of n uniformly distributed elements that are divided into k

    boxes (sub-systems) with b1, b2, , bk elements, the entropy of the whole ensemble

    should be equal to the sum of the entropy of the system of boxes and the individualentropies of the boxes, each weighted with the probability of being in that particular box.

    Forpositive integersbi where b1 + + bk = n,

    Choosing k= n, b1 = = bn = 1 this implies that the entropy of a certain outcome

    is zero:

    It can be shown that any definition of entropy satisfying these assumptions has the form

    where Kis a constant corresponding to a choice of measurement units.

    Information entropy explained

    For a random variable with outcomes , the Shannon

    information entropy, a measure of uncertainty (see further below) and denoted by

    , is defined as

    (1)

    where is the probability mass functionof outcome , and is the base of the

    logarithm used. Common values of are 2, , and 10. The unit of the information entropy

    isbitfor , nat for , dit(or digit) for .

    10

    http://encyclopedia.thefreedictionary.com/Natural+numberhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/BIThttp://encyclopedia.thefreedictionary.com/BIThttp://encyclopedia.thefreedictionary.com/Ban+(information)http://encyclopedia.thefreedictionary.com/Natural+numberhttp://encyclopedia.thefreedictionary.com/Probability+mass+functionhttp://encyclopedia.thefreedictionary.com/BIThttp://encyclopedia.thefreedictionary.com/Ban+(information)
  • 8/7/2019 Shannon's Theory

    11/15

    To understand the meaning of Eq.(1), let's first consider a set of possible outcomes

    (events) , with equal probability . An example

    would be a fairdie with values, from to . The uncertainty for such set of outcomesis defined by

    (2)

    The logarithm is used so to provide the additivity characteristic for independent

    uncertainty. For example, consider appending to each value of the first die the value of a

    second die, which has possible outcomes . There are thus

    possible outcomes . The uncertainty for such

    set of outcomes is then

    (3)

    Thus the uncertainty of playing with two dice is obtained by adding the uncertainty of the

    second die to the uncertainty of the first die .

    Now return to the case of playing with one die only (the first one); since the probability

    of each event is 1 / n, we can write

    In the case of a non-uniform probability mass function (or distribution in the case ofcontinuous random variable), we let

    (4)

    which is also called asurprisal; the lower the probability , i.e. , thehigher the uncertainty or the surprise, i.e. , for the outcome

    The average uncertainty , with being the average operator, is obtained by

    (5)

    and is used as the definition of the information entropy in Eq.(1). The above also

    explained why information entropy and information uncertainty can be usedinterchangeably.

    11

    http://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Self-informationhttp://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Diehttp://encyclopedia.thefreedictionary.com/Self-information
  • 8/7/2019 Shannon's Theory

    12/15

    Example

    As an example, consider a fair coin. The probability of a head or a tail is 0.5. SoI(head) = I(tail) = -log(0.5) = 1. H = 1 * 0.5 + 1 * 0.5 = 1. So the messages each contain

    one bit and the average information per message is one bit. This is what we would expect,

    since each coin toss generates a single bit of information.Now consider a biased coin, p(head) = 2/3, p(tail) = 1/3. We have I(head) =

    -log(2.3) = 0.58. I(tail) = -log(1/3) = 1.58. Note: To find the log (base 2) of a number if

    you have a standard calculator, find log base 10 and then divide this by log 2 (base 10).The entropy for this system is then: H = 0.58 * 2/3 + 1.58 *1/3 = 0.92. This is telling us

    that each message (head or tail) is carrying only .92 bits of information. The reason is that

    the bias means we could have expected to see more heads than tails, so when this

    happens we are not seeing anything unexpected. Perfect information only happens whenwe are told something we couldn't have made any useful attempt to predict.

    The entropy of a system is important because it tells us how much we can hope to

    compress streams of messages in the system. In principle, we could hope to get the datato fit into a system with entropy 1, by finding a perfect compression technique. In

    practice, we will usually not achieve better than about 99% efficiency.Shannon calculated that English text has an entropy of about 2.3 bits per

    character. Modern analysis has suggested that actually it is closer to 1.1-1.6 bits per

    character, depending on the kind of text.

    Further properties

    The Shannon entropy satisfies the following properties:

    Adding or removing an event with probability zero does not contribute to the

    entropy:

    .

    It can be confirmed using the Jensen inequality that

    This maximal entropy of log2(n) is effectively attained by a source alphabet

    having a uniform probability distribution: uncertainty is maximal when all possible

    events are equiprobable.

    Theorem: SupposeXis a random variable having probability distribution p1, p2, pn,wherepi > 0, 1 i n. Then H(X) log2n, with equality if and only if pi = 1/n, 1 i n.

    12

    http://encyclopedia.thefreedictionary.com/Jensen's+inequalityhttp://encyclopedia.thefreedictionary.com/Jensen's+inequality
  • 8/7/2019 Shannon's Theory

    13/15

    PROOF :

    Applying Jensens Inequality, we have the following:

    n

    pp

    pp

    ppxH

    i

    n

    i

    i

    n

    i i

    i

    n

    iii

    2

    1

    2

    1

    2

    12

    log

    )1

    (log1

    log

    log)(

    =

    =

    =

    ==

    =

    Further, equality occurs if and only ifpi = 1/n, 1 i n.

    3. Product Cryptosystems

    Another innovation introduced by Shannon in his 1949 paper was the idea of combining

    cryptosystems by forming their product. This idea has been of fundamental importancein the design of present-day cryptosystems such as the Data Encryption Standard, which

    we study in the next chapter.

    For simplicity, we will confine our attention in this section to cryptosystems in which:PC= cryptosystems of this type are called endomorphic. Suppose

    ),,( 1111 DKPPS = and ),,(2 222 DKPPS = are two endomorphic cryptosystems

    which have the same plaintext (and ciphertext) spaces. Then the product ofS1 and S2,

    denoted by S1 S2, is defined to be the cryptosystem).,,,,( 21 DKKPP

    A key of the product cryptosystem has the form K= (K1, K2), where 11 KK and

    22 KK .The encryption and decryption rules of the product cryptosystem are defined as

    follows: For each K= (K1, K2), we have an encryption rule eK defined by the formula

    )),(()( 1221 ),( xeexe kkkk =

    and a decryption rule defined by the formula

    )).(()(221 1),(yddyd kkkk =

    13

  • 8/7/2019 Shannon's Theory

    14/15

    That is, we first encrypt x with1k

    e , and then re-encrypt the resulting ciphertext with

    2ke . Decrypting is similar, but it must be done in the reverse order:

    .

    ))(())))((((

    )))((())((

    1

    1

    1),(),(),(

    1

    221

    2212121

    X

    XedXeedd

    xeedxed

    kk

    kkkk

    kkkkkkkk

    =

    ==

    =

    Recall also that cryptosystems have probability distributions associated with theirkeyspaces. Thus we need to define the probability distribution for the keyspace Kof the

    product cryptosystem. We do this in a very natural way:

    ).()(),( 2121 21 kpkpkkp kkk =

    In other words, choose K1 using the distribution 1kp , and then independently choose K2

    using the distribution2k

    p .

    Figure 4. Multiplicative Cipher

    Suppose we define the Multiplicative Cipher as in Figure 4

    Suppose M is the Multiplicative Cipher (with keys chosen equiprobably) and S is the

    Shift Cipher (with keys chosen equiprobably). Then it is very easy to see that M S is

    nothing more than the Affine Cipher (again, with keys chosen equiprobably). It is

    slightly more difficult to show that S M is also the Affine Cipher with equiprobablekeys.

    Lets prove these assertions. A key in the Shift Cipher is an element 26Zk , and the

    corresponding encryption rule is eK(x) = x + Kmod 26. A key in the Multiplicative

    Cipher is an element 26Za ,such that gcd(a, 26) = 1; the corresponding encryption rule

    is ea(x) = ax mod 26. Hence, a key in the product cipherM S has the form (a, K), where26mod)(),( kaxxe ka +=

    14

  • 8/7/2019 Shannon's Theory

    15/15

    But this is precisely the definition of a key in the Affine Cipher. Further, the

    probability of a key in the Affine Cipher is 1/312 = 1/12 1/26, which is the product of

    the probabilities of the keys a and K, respectively. Thus M S is the Affine Cipher.Now lets considerS M. A key in this cipher has the form (K, a), where

    26mod)()(),( akaxkxaxe ka +=+=

    Thus the key (K, a) of the product cipherS M is identical to the key (a, aK) of theAffine Cipher. It remains to show that each key of the Affine Cipher arises with the

    same probability 1/312 in the product cipherS M. Observe that aK= K1 if and only ifK= a-1K1 (recall that gcd(a, 26) = 1, so a has a multiplicative inverse). In other words, the

    key (a, K1) of the Affine Cipher is equivalent to the key (a-1K1, a) of the product cipher

    S M. We thus have a bijection between the two key spaces. Since each key isequiprobable, we conclude that S M is indeed the Affine Cipher.

    We have shown that M S = S M. Thus we would say that the two

    cryptosystems commute. But not all pairs of cryptosystems commute; it is easy to findcounterexamples. On the other hand, the product operation is always associative: (S1

    S2) S3 = S1 (S2 S3).

    If we take the product of an (endomorphic) cryptosystem S with itself, we obtainthe cryptosystem S S, which we denote by S2. If we take the n-fold product, the

    resulting cryptosystem is denoted by Sn. We call Sn an iteratedcryptosystem.

    A cryptosystem S is defined to be idempotentifS2 = S. Many of the

    cryptosystems we studied in Chapter 1 are idempotent. For example, the Shift,

    Substitution, Affine, Hill, Vigenere and Permutation Ciphers are all idempotent. Of

    course, if a cryptosystem S is idempotent, then there is no point in using the product

    system S2, as it requires an extra key but provides no more security.If a cryptosystem is not idempotent, then there is a potential increase in security

    by iterating several times. This idea is used in the Data Encryption Standard, which

    consists of 16 iterations. But, of course, this approach requires a non-idempotent

    cryptosystem to start with. One way in which simple non-idempotent cryptosystems cansometimes be constructed is to take the product of two different (simple) cryptosystems.

    BIBLIOGRAPHY:

    C. E. Shannon : Communication Theory of Secrecy Systems,

    Douglas Stinson : Theory and Practice,http://encyclopedia.thefreedictionary.com

    15