alencar chapter 1
TRANSCRIPT
Information TheoryCopyright © Momentum Press�, LLC, 201 .
All rights reserved. No part of this publication may be reproduced, storedin a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other—except forbrief quotations, not to exceed 400 words, without the prior permissionof the publisher.
First published by Momentum Press�, LLC222 East 46th Street, New York, NY 10017www.momentumpress.net
ISBN-13: 978-1-60650-528-1 (print)ISBN-13: 978-1-60650-529-8 (e-book)
Momentum Press Communications and Signal Processing Collection
DOI: 10.5643/9781606505298
Cover and interior design by Exeter Premedia Services Private Ltd.,Chennai, India
10 9 8 7 6 5 4 3 2 1
Printed in the United States of America
5
ABSTRACT
The book presents the historical evolution of Information Theory, alongwith the basic concepts linked to information. It discusses the informa-tion associated to a certain source and the usual types of source codes, theinformation transmission, joint information, conditional entropy, mutualinformation, and channel capacity. The hot topic of multiple access sys-tems, for cooperative and noncooperative channels, is discussed, alongwith code division multiple access (CDMA), the basic block of most cel-lular and personal communication systems, and the capacity of a CDMAsystem. The information theoretical aspects of cryptography, which areimportant for network security, a topic intrinsically connected to computernetworks and the Internet, are also presented. The book includes a reviewof probability theory, solved problems, illustrations, and graphics to helpthe reader understand the theory.
KEY WORDS
Code division multiple access, coding theory, cryptography, informationtheory, multiple access systems, network security
vii
CONTENTS
List of Figures xi
List of Tables xv
Preface xvii
Acknowledgments xix
1. Information Theory 1
1.1 Information Measurement 21.2 Requirements for an Information Metric 4
2. Sources of Information 11
2.1 Source Coding 112.2 Extension of a Memoryless Discrete Source 122.3 Prefix Codes 142.4 The Information Unit 17
3. Source Coding 19
3.1 Types of Source Codes 193.2 Construction of Instantaneous Codes 233.3 Kraft Inequality 243.4 Huffman Code 27
4. Information Transmission 31
4.1 The Concept of Information Theory 324.2 Joint Information Measurement 324.3 Conditional Entropy 344.4 Model for a Communication Channel 344.5 Noiseless Channel 354.6 Channel with Independent Output and Input 364.7 Relations Between the Entropies 374.8 Mutual Information 384.9 Channel Capacity 41
ix
x • CONTENTS
5. Multiple Access Systems 49
5.1 Introduction 495.2 The Gaussian Multiple Access Channel 515.3 The Gaussian Channel with Rayleigh Fading 545.4 The Noncooperative Multiple Access Channel 595.5 Multiple Access in a Dynamic Environment 625.6 Analysis of the capacity for a Markovian Multiple Access
Channel 63
6. Code Division Multiple Access 71
6.1 Introduction 716.2 Fundamentals of Spread Spectrum Signals 746.3 Performance Analysis of CDMA Systems 766.4 Sequence Design 79
7. The Capacity of a CDMA System 87
7.1 Introduction 877.2 Analysis of a CDMA System with a Fixed Number of Users
and Small SNR 877.3 CDMA System with a Fixed Number of Users and High
SNR 977.4 A Tight Bound on the Capacity of a CDMA System 103
8. Theoretical Cryptography 117
8.1 Introduction 1178.2 Cryptographic Aspects of Computer Networks 1188.3 Principles of Cryptography 1198.4 Information Theoretical Aspects of Cryptography 1208.5 Mutual Information for Cryptosystems 123
Appendix A Probability Theory 125
A.1 Set Theory and Measure 125A.2 Basic Probability Theory 131A.3 Random Variables 133
References 139
About the Author 147
Index 149
LIST OF FIGURES
Figure 1.1. Graph of an information function 9
Figure 2.1. Source encoder 12
Figure 2.2. Decision tree for the code in Table 2.5 16
Figure 3.1. Classes of source codes 23
Figure 3.2. Probabilities in descending order for the Huffmancode 28
Figure 3.3. Huffman code. At each phase, the two least probablesymbols are combined 28
Figure 3.4. (a) Example of the Huffman coding algorithm to obtainthe codewords. (b) Resulting code. 29
Figure 4.1. Model for a communication channel 32
Figure 4.2. A probabilistic communication channel 33
Figure 4.3. Venn diagram corresponding to the relation betweenthe entropies 40
Figure 4.4. Memoryless binary symmetric channel 44
Figure 4.5. Graph for the capacity of the memoryless binary sym-metric channel 46
Figure 4.6. Binary erasure channel 46
Figure 4.7. Graph of the capacity for the binary erasure channel 47
Figure 5.1. The multiple access channel 52
Figure 5.2. Capacity region for the Gaussian multiple accesschannel, M = 2 54
Figure 5.3. Average and actual capacity, for γ = 0.5, 1.0, and 2.0 58
Figure 5.4. Capacity region for the noncooperative channel,M = 2 61
Figure 5.5. Markov model for the multiple access channel 64
xi
xii • LIST OF FIGURES
Figure 5.6. Capacity for the channel with Geometric accessibility,as a function of the signal-to-noise ratio, for differentvalues of ρ 66
Figure 5.7. Capacity for the channel with Geometric accessibil-ity, as a function of the utilization factor, for differentvalues of the signal-to-noise ratio 67
Figure 5.8. Capacity for the channel with Poisson accessibility, asa function of the signal-to-noise ratio, for ρ= 0, 1, 2,3, 4, 5, 6, 7, 8, 9 68
Figure 5.9. Capacity for the channel with Poisson accessibility, asa function of the utilization factor, for some values ofthe signal-to-noise ratio 69
Figure 6.1. Data signal and pseudo-noise sequence 72
Figure 6.2. Direct sequence spread spectrum system 73
Figure 6.3. Frequency hopped spread spectrum system 73
Figure 6.4. Spread spectrum using random time windows 73
Figure 6.5. Spectra of transmitted and received signals 75
Figure 6.6. Pseudo-noise sequence generator 82
Figure 6.7. Gold sequence generator 83
Figure 7.1. Capacity approximations for the channel, as a functionof the signal-to-noise ratio, for M = 500 and N = 100 96
Figure 7.2. Bound 1 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 100
Figure 7.3. Bound 2 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 101
Figure 7.4. Bound 3 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 103
Figure 7.5. Capacity for the channel, compared with the lowerbound, as a function of the signal-to-noise ratio(Eb/N0), for M = 20 and sequence length N = 100 106
Figure 7.6. Approximate capacity for the channel, as a function ofthe signal-to-noise ratio (Eb/N0), for M = 20, havingN as a parameter 107
Figure 7.7. Capacity for the channel, using the approximate for-mula, as a function of the sequence length (N ), fordifferent values of M 108
LIST OF FIGURES • xiii
Figure 7.8. Capacity for the channel, as a function of the number ofusers (M ), using the approximation for thecapacity 109
Figure 7.9. Capacity for the channel, as a function of the numberof users (M ), including the case M�N 110
Figure 7.10. Capacity for the channel, as a function of the proba-bility of error (Pe) 111
Figure 7.11. Bound 4 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 112
Figure 7.12. Bounds on the capacity, as a function of the signal-to-noise ratio (Eb/N0) 113
Figure 7.13. Comparison between the new and existing bounds,as a function of the signal-to-noise ratio (Eb/N0), forM = 20 and N = 100 114
Figure 8.1. General model for a cryptosystem 119
Figure A.1. A Venn diagram that represents two intersecting sets 127
Figure A.2. A Venn diagram representing disjoint sets 127
Figure A.3. Increasing sequence of sets 128
Figure A.4. Decreasing sequence of sets 128
Figure A.5. Partition of set B by a family of sets {Ai} 133
Figure A.6. Joint probability density function 137
LIST OF TABLES
Table 1.1. Symbol probabilities of a two-symbol source 4
Table 1.2. Identically distributed symbol probabilities 5
Table 1.3. Unequal symbol probabilities 5
Table 1.4. Symbol probabilities of a certain source 9
Table 2.1. A compact code 13
Table 2.2. A compact code for an extension of a source 14
Table 2.3. A prefix code for a given source 14
Table 2.4. A source code that is not prefix 15
Table 2.5. Example of a prefix code 15
Table 3.1. A binary block code 20
Table 3.2. A ternary block code 20
Table 3.3. A nonsingular block code 20
Table 3.4. A nonsingular block code 21
Table 3.5. The second extension of a block code 21
Table 3.6. Uniquely decodable codes. 22
Table 3.7. Another uniquely decodable code 22
Table 3.8. Selected binary codes 25
Table 3.9. Discrete source with five symbols and their probabilities 28
Table 3.10. Four distinct Huffman codes obtained for the source ofTable 3.9 30
Table 6.1. Number of maximal sequences 81
Table 6.2. Relative peak cross-correlations of m-sequences,Gold sequences and Kasami sequences 83
xv
PREFACE
Information Theory is a classic topic in the educational market that evolvedfrom the amalgamation of different areas of Mathematics and Probability,which includes set theory, developed by Georg Cantor, and measure theory,fostered by Henri Lebesgue, as well as the axiomatic treatment of probabil-ity by Andrei Kolmogorov in 1931, and finally the beautiful developmentof Communication Theory by Claude Shannon in 1948.
InformationTheory is fundamental to several areas of knowledge, includ-ing the Engineering, Computer Science, Mathematics, Physics, Sciences,Economics, Social Sciences, and Social Communication. It is part of thesyllabus for most courses in Computer Science, Mathematics, and Engi-neering.
For Electrical Engineering courses it is a pre-requisite to some disci-plines, including communication systems, transmission techniques, errorcontrol coding, estimation, and digital signal processing. This book is self-contained, it is a reference and an introduction for graduate students whodid not have information theory before. It could also be used as an under-graduate textbook. It is addressed to a large audience in Electrical andComputer Engineering, and Mathematics and Applied Physics. The book’starget audience is graduate students in these areas, who may not have takenbasic courses in specific topics, who can find a quick and concise way toobtain the knowledge they need to succeed in advanced courses.
REASONS FOR WRITING THE BOOK
According to a study by the Institute of Electrical and Electronics Engi-neers (IEEE), the companies, enterprises, and industry are in need of pro-fessionals with a solid background on mathematics and sciences, insteadof the specialized professional of the previous century. The employmentmarket in this area is in demand of information technology professionals
xvii
xviii • PREFACE
and engineers who could afford to change and learn, as the market changes.The market needs professionals who can model and design.
Few books have been published covering the subjects needed to under-stand the very fundamental concepts of Information Theory. Most books,which deal with the subject, are destined to very specific audiences.
The more mathematically oriented books are seldom used by people withengineering, economics, or statistical background, because the authors aremore interested in theorems and related conditions than in fundamentalconcepts and applications. The books written for engineers usually lackthe required rigour, or skip some important points in favor of simplicityand conciseness.
The idea is to present a seamless connection between the more abstractadvanced set theory, the fundamental concepts from measure theory andintegration and probability, filling in the gaps from previous books andleading to an interesting, robust, and, hopefully, self-contained expositionof the Information Theory.
DESCRIPTION OF THE BOOK
The book begins with the historical evolution of Information Theory.Chapter 1 deals with the basic concepts of information theory, and howto measure information. The information associated to a certain source isdiscussed in Chapter 2. The usual types of source codes are presented inChapter 3. Information transmission, joint information, conditional entropy,mutual information, and channel capacity are the subject of Chapter 4. Thehot topic of multiple access systems, for cooperative and noncooperativechannels, is discussed in Chapter 5.
Chapter 6 presents code division multiple access (CDMA), the basicblock of most cellular and personal communication systems in operation.The capacity of a CDMA system is the subject of Chapter 7. The infor-mation theoretical aspects of cryptography are presented in Chapter 8,which are important for network security, a topic intrinsically connectedto computer networks and the Internet. The appendix includes a review ofprobability theory. Solved problems, illustrations, and graphics help thereader understand the theory.
ACKNOWLEDGMENTS
The author is grateful to all the members of the Communications ResearchGroup, certified by the National Council for Scientific and TechnologicalDevelopment (CNPq), at the Federal University of Campina Grande, fortheir collaboration in many ways, helpful discussions and friendship, aswell as our colleagues at the Institute for Advanced Studies in Communi-cations (Iecom).
The author also wishes to acknowledge the contribution of profes-sors Francisco Madeiro, from the State University of Pernambuco, andWaslon T. A. Lopes, from the Federal University of Campina Grande,Brazil, to the chapter on source coding.
The author is also grateful to professor Valdemar Cardoso da Rocha Jr.,from the Federal University of Pernambuco, Brazil, for technical commu-nications, long-term cooperation, and useful discussions related to infor-mation theory.
The author is indebted to his wife Silvana, sons Thiago and Raphael,and daughter Marcella, for their patience and support during the course ofthe preparation of this book.
The author is thankful to professor Orlando Baiocchi, from the Uni-versity of Washington, Tacoma, USA, who strongly supported this projectfrom the beginning and helped with the reviewing process.
Finally, the author registers the support of Shoshanna Goldberg,Destiny Hadley, Charlene Kronstedt, Jyothi, and Millicent Treloar fromMomentum Press, in the book preparation process.
xix
CHAPTER 1
INFORMATION THEORY
Information Theory is a branch of Probability Theory, which has applica-tion and correlation with many areas, including communication systems,communication theory, Physics, language and meaning, cybernetics, psy-chology, art, and complexity theory (Pierce 1980). The basis for the theorywas established by Harry Theodor Nyqvist (1889–1976) (Nyquist 1924),also known as Harry Nyquist, and RalphVinton Lyon Hartley (1888–1970),who invented the Hartley oscillator (1928). They published the first arti-cles on the subject, in which the factors that influenced the transmission ofinformation were discussed.
The seminal article by Claude E. Shannon (1916–2001) extended thetheory to include new factors, such as the noise effect in the channel andthe savings that could be obtained as a function of the statistical struc-ture of the original message and the information receiver characteristics(Shannon 1948). Shannon defined the fundamental communication prob-lem as the possibility of, exactly or approximately, reproducing, at a certainpoint, a message that has been chosen at another one.
The main semantic aspects of the communication, initially establishedby Charles Sanders Peirce (1839–1914), a philosopher and creator of Semi-otic Theory, are not relevant for the development of the Shannon informa-tion theory. What is important is to consider that a particular message isselected from a set of possible messages.
Of course, as mentioned by John Robinson Pierce (1910–2002), quotingthe philosopher Alfred Jules Ayer (1910–1989), it is possible to communi-cate not only information, but knowledge, errors, opinions, ideas, experi-ences, desires, commands, emotions, feelings. Heat and movement can becommunicated, as well as, force, weakness, and disease (Pierce 1980).
2 • INFORMATION THEORY
Hartley has found several reasons why the natural logarithm shouldmeasure the information:
• It is a practical metric in Engineering, considering that variousparameters, such as time and bandwidth, are proportional to thelogarithm of the number of possibilities.
• From a mathematical point of view, it is as adequate measure,because several limit operations are simply stated in terms oflogarithms.
• It has an intuitive appeal, as an adequate metric, because, forinstance, two binary symbols have four possibilities of occurrence.
The choice of the logarithm base defines the information unit. If base 2 isused, the unit is the bit, an acronym suggested by John W. Tukey for binarydigit, which is a play of words that can also mean a piece of information.The information transmission is informally given in bit(s), but a unit hasbeen proposed to pay tribute to the scientist who developed the concept, itis called the shannon, or [Sh] for short. This has a direct correspondencewith the unit for frequency, hertz or [Hz], for cycles per second, which wasadopted by the International System of Units (SI).1
Aleksandr Yakovlevich Khinchin (1894–1959) put the InformationTheory in solid basis, with a more a precise and unified mathematicaldiscussion about the entropy concept, which supported Shannon’s intuitiveand practical view (Khinchin 1957).
The books by Robert B. Ash (1965) and Amiel Feinstein (1958) givethe mathematical reasons for the choice of the logarithm to measure infor-mation, and the book by J. Aczél and Z. Daróczy (1975) presents severalof Shannon’s information measures and their characterization, as well asAlfréd Rényi’s (1921–1970) entropy metric.
A discussion on generalized entropies can be found in the book editedby Luigi M. Ricciardi (1990). Lotfi Asker Zadeh introduced the conceptof fuzzy set, and efficient tool to represent the behavior of systems thatdepend on the perception and judgement of human beings, and applied itto information measurement (Zadeh 1965).
1.1 INFORMATION MEASUREMENT
The objective of this section is to establish a measure for the informationcontent of a discrete system, using Probability Theory. Consider a discreterandom experiment, such as the occurrence of a symbol, and its associatedsample space �, in which X is a real random variable (Reza 1961).
INFORMATION THEORY • 3
The random variable X can assume the following values
X ={x1, x2, . . . , xn},
in whichN⋃
k=1
xk =�, (1.1)
with probabilities in the set P
P={p1, p2, . . . , pn},
in whichN∑
k=1
pk = 1. (1.2)
The information associated to a particular event is given by
I (xi)= log(
1
pi
), (1.3)
because the sure event has probability one and zero information, by aproperty of the logarithm, and the impossible event has zero probabilityand infinite information.
Example: suppose the sample space is partitioned into two equally prob-able spaces. Then
I (x1)= I (x2)=− log 12 = 1bit, (1.4)
that is, the choice between two equally probable events requires one unitof information, when a base 2 logarithm is used.
Considering the occurrence of 2N equiprobable symbols, then the self-information of each event is given by
I (xk )=− log pk =− log 2−N =Nbits. (1.5)
It is possible to define the source entropy, H (X ), as the average infor-mation, obtained by weighing of all the occurrences
H (X )=E[I (xi)]=−N∑
i=1
pi log pi. (1.6)
Observe that Equation 1.6 is the weighing average of the logarithmsof the probabilities, in which the weights are the real values of the prob-abilities of the random variable X , and this indicates that H (X ) can beinterpreted as the expected value of the random variable that assumes thevalue log pi, with probability pi (Ash 1965).
4 • INFORMATION THEORY
Table 1.1. Symbol probabilities of atwo-symbol source
Symbol Probability
x114
x234
Example: consider a source that emits two symbols, with unequal proba-bilities, given in Table 1.1.
The source entropy is calculated as
H (X )=−1
4log
1
4− 3
4log
3
4= 0.81 bits per symbol.
1.2 REQUIREMENTS FOR AN INFORMATIONMETRIC
A few fundamental properties are necessary for the entropy in order toobtain an axiomatic approach to base the information measurement (Reza1961).
• If the event probabilities suffer a small change, the associatedmeasure must change in accordance, in a continuous manner, whichprovides a physical meaning to the metric
H (p1, p2, . . . , pN ) is continuous in pk , k = 1, 2, . . . , N ,
0≤ pk ≤ 1. (1.7)
• The information measure must be symmetric in relation to the prob-ability set P. That is, the entropy is invariant to the order of events.
H (p1, p2, p3, . . . , pN )=H (p1, p3, p2, . . . , pN ). (1.8)
• The maximum of the entropy is obtained when the events are equallyprobable. That is, when nothing is known about the set of events,or about what message has been produced, the assumption of auniform distribution gives the highest information quantity, thatcorresponds to the highest level of uncertainty.
Maximum of H (p1, p2, . . . , pN )=H
(1
N,
1
N, . . . ,
1
N
). (1.9)
INFORMATION THEORY • 5
Table 1.2. Identically distributedsymbol probabilities
Symbol Probability
x114
x214
x314
x414
Table 1.3. Unequal symbol proba-bilities
Symbol Probability
x112
x214
x318
x418
Example: consider two sources that emit four symbols. The firstsource symbols, shown in Table 1.2, have equal probabilities, andthe second source symbols, shown in Table 1.3, are produced withunequal probabilities.The mentioned property indicates that the first source attains thehighest level of uncertainty, regardless of the probability values ofthe second source, as long as they are different.
• Consider that and adequate measure for average uncertainty wasfound H (p1, p2, . . . , pN ) associated to a set of events. Assume thatevent {xN } is divided into M disjoint sets, with probabilities qk ,such that
pN =M∑
k=1
qk , (1.10)
and the probabilities associated to the new events can be normalizedin such a way that
q1
pn+ q2
pn+ · · · + qm
pn= 1. (1.11)
6 • INFORMATION THEORY
Then, the creation of new events from the original set modifies theentropy to
H (p1, p2, . . . , pN−1, q1, q2, . . . , qM )=H (p1, . . . , pN−1, pN )
+ pN H
(q1
pN,
q2
pN, . . . ,
qM
pN
), (1.12)
with
pN =M∑
k=1
qk .
It is possible to show that the function defined by Equation 1.6 satisfiesall requirements. To demonstrate the continuity, it suffices to do (Reza1961)
H (p1, p2, . . . , pN )= − [p1 log p1 + p2 log p2 + · · · + pN log pN ]
= − [p1 log p1 + p2 log p2 + · · · pN−1 log pN−1
+ (1− p1 − p2 − · · · − pN−1)
· log (1− p1 − p2 − · · · − pN−1)] . (1.13)
Notice that, for all independent random variables, the set of probabilitiesp1, p2, . . . , pN−1 and also (1− p1 − p2 − . . .− pN−1) are contiguous in[0, 1], and that the logarithm of a continuous function is also continuous.The entropy is clearly symmetric.
The maximum value property can be demonstrated, if one considersthat all probabilities are equal and that the entropy is maximized by thatcondition
p1= p2= · · ·= pN . (1.14)
Taking into account that according to intuition the uncertainty ismaximum for a system of equiprobable states, it is possible to arbitrar-ily choose a random variable with probability pN depending on pk , andk = 1, 2, . . . , N − 1. Taking the derivative of the entropy in terms of eachprobability
dH
dpk=
N∑i=1
∂H
∂pi
∂pi
∂pk
= − d
dpk(pk log pk )− d
dpN(pN log pN )
∂pN
∂pk. (1.15)
But, probability pN can be written as
pN = 1− (p1 + p2 + · · · + pk + · · · + pN−1). (1.16)
INFORMATION THEORY • 7
Therefore, the derivative of the entropy is
dH
dpk=−( log2 e + log pk )+ ( log2 e + log pn), (1.17)
that, using a property of logarithms, simplifies to,
dH
dpk=− log
pk
pn. (1.18)
But,dH
dpk= 0, which gives pk = pN . (1.19)
As pk was chosen in an arbitrary manner, one concludes that to obtaina maximum for the entropy function, one must have
p1= p2= · · ·= pN = 1
N. (1.20)
The maximum is guaranteed because
H (1, 0, 0, . . . , 0)= 0. (1.21)
On the other hand, for equiprobable events, it is possible to verify thatthe entropy is always positive, for it attains its maximum at (Csiszár andKórner 1981)
H
(1
N,
1
N, . . . ,
1
N
)= log N > 0. (1.22)
To prove additivity, it suffices to use the definition of entropy,computed for a two set partition, with probabilities {p1, p2, . . . , pN−1} and{q1, q2, . . . , qM },
H (p1, p2, . . . , pN−1, q1, q2, . . . , qM )
= −N−1∑k=1
pk log pk −M∑
k=1
qk log qk
= −N∑
k=1
pk log pk + pN log pN −M∑
k=1
qk log qk
= H (p1, p2, . . . , pN )+ pN log pN
−M∑
k=1
qk log qk . (1.23)
8 • INFORMATION THEORY
But, the second part of the last term can be written in a way to displaythe importance of the entropy in the derivation
pN log pN −M∑
k=1
qk log qk = pN
M∑k=1
qk
pNlog pN −
M∑k=1
qk log qk
= − pN
M∑k=1
qk
pNlog
qk
pN
= pN H
(q1
pN,
q2
pN, . . . ,
qM
pN
), (1.24)
and this demonstrates the mentioned property.The entropy is non-negative, which guarantees that the partitioning of
one event into several other events does not reduce the system entropy, asshown in the following
H (p1, . . . , pN−1, q1, q2, . . . , qM )≥H (p1, . . . , pN−1, pN ), (1.25)
that is, if one splits a symbol into two or more, the entropy always increases,and that is the physical origin of the word.
Example: consider a binary source, X , that emits symbols 0 and 1 withprobabilities p and q= 1− p. The average information per symbol is givenby H (X )=−p log p− q log q, that is known as entropy function.
H (p)=−p log p− (1− p) log (1− p). (1.26)
Example: for the binary source, consider that the symbol probabilitiesare p= 1/8 and q= 7/8, and compute the entropy of the source.
The average information per symbol is given by
H (X )=−1/8 log 1/8− 7/8 log 7/8,
which gives H (X )= 0.544.Note that even though 1 bit is produced for each symbol, the actual
average information is 0.544 bits due to the unequal probabilities.The entropy function has a maximum, when all symbols are equiprob-
able, of p= q= 1/2, for which the entropy is 1 bit/symbol. The functionattains a minimum of p= 0 or p= 1.
This function plays an essential role in determining the capacity of abinary symmetric channel. Observe that the entropy function is concave,that is
H (p1)+ H (p2)≤ 2H
(p1 + p2
2
). (1.27)
INFORMATION THEORY • 9
1
1
H(p)
0 p
Figure 1.1. Graph of an information function.
The entropy function is illustrated in Figure 1.1, in which it is possi-ble to notice the symmetry, concavity, and the maximum for equiprobablesymbols. As consequence of the symmetry, the sample spaces with proba-bility distributions obtained from permutations of a common distributionprovide the same information quantity (van der Lubbe 1997).
Example: consider a source that emits symbols from an alphabet X ={x1, x2, x3, x4} with probabilities given in Table 1.4. What is the entropy ofthis source?
The entropy is computed using Formula 1.6, for N = 4 symbols, as
H (X )=−4∑
i=1
pi log pi,
or
H (X )=−1
2log
1
2− 1
4log
1
4− 2
8log
1
8= 1.75 bits per symbol.
Table 1.4. Symbol probabilities ofa certain source
Symbol Probability
x112
x214
x318
x418
10 • INFORMATION THEORY
NOTES
1 The author of this book proposed the adoption of the shannon [Sh]unit during the IEEE International Conference on Communications(ICC 2001), in Helsinki, Finland, shortly after Shannon’s death.
INDEX
AAbsolutely secure event, 124Algebra, 127, 129
Borel, 130closure, 129
Algebraic coding theory, 77Alzheimer, Aloysius, 32Appearance
equivocation, 121Asymmetric key, 120Axiom
choice, 126Peano, 126specification, 126
Axioms, 126Ayer, Alfred Jules,
BBelonging, 126Binary
Huffman, 27Bit, 17Bletchley Park, 117Borel
algebra, 130Borel, Félix Edouard Juston
Émile, 13BSC, 4
CCantor, Georg, 125Capacity
bound, 107channel, 41sum, 94
Cardinalnumber, 130numbers, 125
Cardinality, 130Carrier suppression, 80CDMA, 72, 76, 87
non-cooperative, 89performance analysis, 76,
90Channel
binary symmetric, 44communication, 34discrete, 43independent output, 36noiseless, 35noisy, 43non-cooperative, 89
Ciphertextentropy, 121
Clausius, Rudolph, 31Closure, 129CMOS, 17Code
chip rate, 88classification, 19comma, 21decision tree, 14
149
1
04
150 • INDEX
decoding tree, 15Huffman, 27instantaneous, 21, 23prefix, 14, 16
extended, 17uniquely decodable, 20
Codercodewords, 1source, 13
Codesblock, 19non-singular, 19
Codewords, 11Coding
efficiency, 12source, 11
Communicationspersonal, 71
Computer network, 118Cryptography, 72, 118
history, 117information theory, 121model, 119principles, 120uncertainty, 121
Cryptosystemabsolute security, 124information theory, 123
DDedekind, J. W. R., 125Direct sequence, 72Disjoint, 126DPSK, 87DS, 72DSSS, 78, 87
EEfficiency, 43Empty, 126Enigma, 117Entropy,
conditional, 34cryptography, 120cryptosystem, 122extended source, 12inequality, 37joint, 33, 39properties, 6relations, 37
Error probability, 109Extension
source, 12
FFading, 72Families, 128Fano, Robert, 27FDMA, 76FH, 72Fractals, 125Frequency hopping, 72Fuzzy set, 2
GGold sequence, 82
HHartley, Ralph Vinton Lyon, 1,
31Hilbert, David, 125Huffman
non-unicity, 29procedure, 28
Huffman, David, 27
IInclusion, 126Indexing, 129Inequality
Kraft, 24Kraft-McMillan, 16
Informationaverage mutual, 39
1
3
INDEX • 151
channel, 34entropy, 2, 3joint, 32measure of, 2mutual,semantics, 1theory, 1
Internetbackbone, 118protocol, 1
JJammer, 74
broad band noise, 74partial band, 74partial time, 74repeater, 74tone, 74
Jamming, 72Joint entropy
cryptosystem, 122Joint random Variables, 136
KKasami
sequence, 84Key
entropy, 120equivocation, 121
Khinchin, AleksandrYakovlevich, 2
Kolmogorov, AndreiNikolaevich, 31, 131
Kraftinequality, 24
Kraft-McMillaninequality, 16
Kronecker, Leopold, 125
LLaser, 18Lebesgue, Henri Léon, 40
MMAI, 78Matched filter, 91Maximal length
linear sequence, 80sequence, 79
Maximum likelihood receiver,91
Measure, 126Lebesgue, 40probability, 126
Messageequivocation, 121uncertainty, 121
Moments, 134Multipath, 72, 88Multiple access, 71
DSSS, 78interference, 78
NNon-cooperative channel, 89Non-repudiation, 118Nyquist, Harry, 1, 31
PPassword, 118PCN, 87Peano
axiom, 126Peirce, Charles Sanders, 1Personal communications, 71Photon, 17Pierce, John Robinson, 1Plaintext
entropy, 120Prefix
code, 14, 16Probability, 126
joint random variables, 136moments, 134random variables, 13
38
18
3
152 • INDEX
Probability theory, 133Pseudo-random, 80Public key infrastructure,
118
RRényi, Alfréd, 2Radiation
electromagnetic, 18Random variables, 13Rate
transmission, 43Redundancy, 12
absolute, 42relative, 43
SSchröder-Bernstein
theorem, 126Sequence
Gold, 82Kasami, 84maximal length, 81maximal-length, 79, 80Mersenne, 81Welsh bound, 82
Sequence design, 79Session
hijacking, 118Set, 125
algebra, 127disjoint, 126families, 128fuzzy, 2infinite, 125operations, 127universal, 126universal set, 125
Set theory, 125, 126Sets
algebra, 129indexing, 129
ShannonClaude Elwood, 39, 41first theorem, 12
Shannon, Claude Elwood,
Signalspace, 74
Sniffing, 118Source
efficiency, 12extended, 12
Source codingtheorem, 12
Spectralcompression, 75
Spoofing, 118Spread spectrum, 71
m-sequence, 79carrier suppression, 8direct sequence, 72DSSS, 87frequency hopping, 72interference, 77performance Analysis, 76pseudo-random, 80sequence design, 79time hopping, 73time-frequency hopping,
Substitution, 120Sum capacity, 94Symmetric key, 120
TTCP/IP, 118TDMA, 76TH, 73Theorem
Schröder-Bernstein, 126Time hopping, 73Transfinite arithmetic, 12
3
1, 31
0
78
5
INDEX • 153
Transformationcryptography, 119
Transposition, 120TTL, 17Turing, Alan Mathison, 117
UUniversal, 126University of Halle, 125
VVectorial
space, 74
Venndiagram, 126
Venn diagram, 126
WWelsh bound, 82
ZZadeh, Lotfi Asker, 2Zenon, 125Zorn
lemma, 126Zorn’s lemma, 126