alencar chapter 1

36
INFORMATION THEORY

Upload: charlenekronstedt

Post on 18-Jan-2016

55 views

Category:

Documents


0 download

TRANSCRIPT

INFORMATIONTHEORY

INFORMATIONTHEORY

MARCELO S. ALENCAR

MOMENTUM PRESS, LLC, NEW YORK

Information TheoryCopyright © Momentum Press�, LLC, 201 .

All rights reserved. No part of this publication may be reproduced, storedin a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other—except forbrief quotations, not to exceed 400 words, without the prior permissionof the publisher.

First published by Momentum Press�, LLC222 East 46th Street, New York, NY 10017www.momentumpress.net

ISBN-13: 978-1-60650-528-1 (print)ISBN-13: 978-1-60650-529-8 (e-book)

Momentum Press Communications and Signal Processing Collection

DOI: 10.5643/9781606505298

Cover and interior design by Exeter Premedia Services Private Ltd.,Chennai, India

10 9 8 7 6 5 4 3 2 1

Printed in the United States of America

5

This book is dedicated to my family.

ABSTRACT

The book presents the historical evolution of Information Theory, alongwith the basic concepts linked to information. It discusses the informa-tion associated to a certain source and the usual types of source codes, theinformation transmission, joint information, conditional entropy, mutualinformation, and channel capacity. The hot topic of multiple access sys-tems, for cooperative and noncooperative channels, is discussed, alongwith code division multiple access (CDMA), the basic block of most cel-lular and personal communication systems, and the capacity of a CDMAsystem. The information theoretical aspects of cryptography, which areimportant for network security, a topic intrinsically connected to computernetworks and the Internet, are also presented. The book includes a reviewof probability theory, solved problems, illustrations, and graphics to helpthe reader understand the theory.

KEY WORDS

Code division multiple access, coding theory, cryptography, informationtheory, multiple access systems, network security

vii

CONTENTS

List of Figures xi

List of Tables xv

Preface xvii

Acknowledgments xix

1. Information Theory 1

1.1 Information Measurement 21.2 Requirements for an Information Metric 4

2. Sources of Information 11

2.1 Source Coding 112.2 Extension of a Memoryless Discrete Source 122.3 Prefix Codes 142.4 The Information Unit 17

3. Source Coding 19

3.1 Types of Source Codes 193.2 Construction of Instantaneous Codes 233.3 Kraft Inequality 243.4 Huffman Code 27

4. Information Transmission 31

4.1 The Concept of Information Theory 324.2 Joint Information Measurement 324.3 Conditional Entropy 344.4 Model for a Communication Channel 344.5 Noiseless Channel 354.6 Channel with Independent Output and Input 364.7 Relations Between the Entropies 374.8 Mutual Information 384.9 Channel Capacity 41

ix

x • CONTENTS

5. Multiple Access Systems 49

5.1 Introduction 495.2 The Gaussian Multiple Access Channel 515.3 The Gaussian Channel with Rayleigh Fading 545.4 The Noncooperative Multiple Access Channel 595.5 Multiple Access in a Dynamic Environment 625.6 Analysis of the capacity for a Markovian Multiple Access

Channel 63

6. Code Division Multiple Access 71

6.1 Introduction 716.2 Fundamentals of Spread Spectrum Signals 746.3 Performance Analysis of CDMA Systems 766.4 Sequence Design 79

7. The Capacity of a CDMA System 87

7.1 Introduction 877.2 Analysis of a CDMA System with a Fixed Number of Users

and Small SNR 877.3 CDMA System with a Fixed Number of Users and High

SNR 977.4 A Tight Bound on the Capacity of a CDMA System 103

8. Theoretical Cryptography 117

8.1 Introduction 1178.2 Cryptographic Aspects of Computer Networks 1188.3 Principles of Cryptography 1198.4 Information Theoretical Aspects of Cryptography 1208.5 Mutual Information for Cryptosystems 123

Appendix A Probability Theory 125

A.1 Set Theory and Measure 125A.2 Basic Probability Theory 131A.3 Random Variables 133

References 139

About the Author 147

Index 149

LIST OF FIGURES

Figure 1.1. Graph of an information function 9

Figure 2.1. Source encoder 12

Figure 2.2. Decision tree for the code in Table 2.5 16

Figure 3.1. Classes of source codes 23

Figure 3.2. Probabilities in descending order for the Huffmancode 28

Figure 3.3. Huffman code. At each phase, the two least probablesymbols are combined 28

Figure 3.4. (a) Example of the Huffman coding algorithm to obtainthe codewords. (b) Resulting code. 29

Figure 4.1. Model for a communication channel 32

Figure 4.2. A probabilistic communication channel 33

Figure 4.3. Venn diagram corresponding to the relation betweenthe entropies 40

Figure 4.4. Memoryless binary symmetric channel 44

Figure 4.5. Graph for the capacity of the memoryless binary sym-metric channel 46

Figure 4.6. Binary erasure channel 46

Figure 4.7. Graph of the capacity for the binary erasure channel 47

Figure 5.1. The multiple access channel 52

Figure 5.2. Capacity region for the Gaussian multiple accesschannel, M = 2 54

Figure 5.3. Average and actual capacity, for γ = 0.5, 1.0, and 2.0 58

Figure 5.4. Capacity region for the noncooperative channel,M = 2 61

Figure 5.5. Markov model for the multiple access channel 64

xi

xii • LIST OF FIGURES

Figure 5.6. Capacity for the channel with Geometric accessibility,as a function of the signal-to-noise ratio, for differentvalues of ρ 66

Figure 5.7. Capacity for the channel with Geometric accessibil-ity, as a function of the utilization factor, for differentvalues of the signal-to-noise ratio 67

Figure 5.8. Capacity for the channel with Poisson accessibility, asa function of the signal-to-noise ratio, for ρ= 0, 1, 2,3, 4, 5, 6, 7, 8, 9 68

Figure 5.9. Capacity for the channel with Poisson accessibility, asa function of the utilization factor, for some values ofthe signal-to-noise ratio 69

Figure 6.1. Data signal and pseudo-noise sequence 72

Figure 6.2. Direct sequence spread spectrum system 73

Figure 6.3. Frequency hopped spread spectrum system 73

Figure 6.4. Spread spectrum using random time windows 73

Figure 6.5. Spectra of transmitted and received signals 75

Figure 6.6. Pseudo-noise sequence generator 82

Figure 6.7. Gold sequence generator 83

Figure 7.1. Capacity approximations for the channel, as a functionof the signal-to-noise ratio, for M = 500 and N = 100 96

Figure 7.2. Bound 1 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 100

Figure 7.3. Bound 2 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 101

Figure 7.4. Bound 3 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 103

Figure 7.5. Capacity for the channel, compared with the lowerbound, as a function of the signal-to-noise ratio(Eb/N0), for M = 20 and sequence length N = 100 106

Figure 7.6. Approximate capacity for the channel, as a function ofthe signal-to-noise ratio (Eb/N0), for M = 20, havingN as a parameter 107

Figure 7.7. Capacity for the channel, using the approximate for-mula, as a function of the sequence length (N ), fordifferent values of M 108

LIST OF FIGURES • xiii

Figure 7.8. Capacity for the channel, as a function of the number ofusers (M ), using the approximation for thecapacity 109

Figure 7.9. Capacity for the channel, as a function of the numberof users (M ), including the case M�N 110

Figure 7.10. Capacity for the channel, as a function of the proba-bility of error (Pe) 111

Figure 7.11. Bound 4 on the capacity for the channel, as a functionof the signal-to-noise ratio (Eb/N0) 112

Figure 7.12. Bounds on the capacity, as a function of the signal-to-noise ratio (Eb/N0) 113

Figure 7.13. Comparison between the new and existing bounds,as a function of the signal-to-noise ratio (Eb/N0), forM = 20 and N = 100 114

Figure 8.1. General model for a cryptosystem 119

Figure A.1. A Venn diagram that represents two intersecting sets 127

Figure A.2. A Venn diagram representing disjoint sets 127

Figure A.3. Increasing sequence of sets 128

Figure A.4. Decreasing sequence of sets 128

Figure A.5. Partition of set B by a family of sets {Ai} 133

Figure A.6. Joint probability density function 137

LIST OF TABLES

Table 1.1. Symbol probabilities of a two-symbol source 4

Table 1.2. Identically distributed symbol probabilities 5

Table 1.3. Unequal symbol probabilities 5

Table 1.4. Symbol probabilities of a certain source 9

Table 2.1. A compact code 13

Table 2.2. A compact code for an extension of a source 14

Table 2.3. A prefix code for a given source 14

Table 2.4. A source code that is not prefix 15

Table 2.5. Example of a prefix code 15

Table 3.1. A binary block code 20

Table 3.2. A ternary block code 20

Table 3.3. A nonsingular block code 20

Table 3.4. A nonsingular block code 21

Table 3.5. The second extension of a block code 21

Table 3.6. Uniquely decodable codes. 22

Table 3.7. Another uniquely decodable code 22

Table 3.8. Selected binary codes 25

Table 3.9. Discrete source with five symbols and their probabilities 28

Table 3.10. Four distinct Huffman codes obtained for the source ofTable 3.9 30

Table 6.1. Number of maximal sequences 81

Table 6.2. Relative peak cross-correlations of m-sequences,Gold sequences and Kasami sequences 83

xv

PREFACE

Information Theory is a classic topic in the educational market that evolvedfrom the amalgamation of different areas of Mathematics and Probability,which includes set theory, developed by Georg Cantor, and measure theory,fostered by Henri Lebesgue, as well as the axiomatic treatment of probabil-ity by Andrei Kolmogorov in 1931, and finally the beautiful developmentof Communication Theory by Claude Shannon in 1948.

InformationTheory is fundamental to several areas of knowledge, includ-ing the Engineering, Computer Science, Mathematics, Physics, Sciences,Economics, Social Sciences, and Social Communication. It is part of thesyllabus for most courses in Computer Science, Mathematics, and Engi-neering.

For Electrical Engineering courses it is a pre-requisite to some disci-plines, including communication systems, transmission techniques, errorcontrol coding, estimation, and digital signal processing. This book is self-contained, it is a reference and an introduction for graduate students whodid not have information theory before. It could also be used as an under-graduate textbook. It is addressed to a large audience in Electrical andComputer Engineering, and Mathematics and Applied Physics. The book’starget audience is graduate students in these areas, who may not have takenbasic courses in specific topics, who can find a quick and concise way toobtain the knowledge they need to succeed in advanced courses.

REASONS FOR WRITING THE BOOK

According to a study by the Institute of Electrical and Electronics Engi-neers (IEEE), the companies, enterprises, and industry are in need of pro-fessionals with a solid background on mathematics and sciences, insteadof the specialized professional of the previous century. The employmentmarket in this area is in demand of information technology professionals

xvii

xviii • PREFACE

and engineers who could afford to change and learn, as the market changes.The market needs professionals who can model and design.

Few books have been published covering the subjects needed to under-stand the very fundamental concepts of Information Theory. Most books,which deal with the subject, are destined to very specific audiences.

The more mathematically oriented books are seldom used by people withengineering, economics, or statistical background, because the authors aremore interested in theorems and related conditions than in fundamentalconcepts and applications. The books written for engineers usually lackthe required rigour, or skip some important points in favor of simplicityand conciseness.

The idea is to present a seamless connection between the more abstractadvanced set theory, the fundamental concepts from measure theory andintegration and probability, filling in the gaps from previous books andleading to an interesting, robust, and, hopefully, self-contained expositionof the Information Theory.

DESCRIPTION OF THE BOOK

The book begins with the historical evolution of Information Theory.Chapter 1 deals with the basic concepts of information theory, and howto measure information. The information associated to a certain source isdiscussed in Chapter 2. The usual types of source codes are presented inChapter 3. Information transmission, joint information, conditional entropy,mutual information, and channel capacity are the subject of Chapter 4. Thehot topic of multiple access systems, for cooperative and noncooperativechannels, is discussed in Chapter 5.

Chapter 6 presents code division multiple access (CDMA), the basicblock of most cellular and personal communication systems in operation.The capacity of a CDMA system is the subject of Chapter 7. The infor-mation theoretical aspects of cryptography are presented in Chapter 8,which are important for network security, a topic intrinsically connectedto computer networks and the Internet. The appendix includes a review ofprobability theory. Solved problems, illustrations, and graphics help thereader understand the theory.

kronstedt
Sticky Note
Marked set by kronstedt
kronstedt
Sticky Note
Marked set by kronstedt

ACKNOWLEDGMENTS

The author is grateful to all the members of the Communications ResearchGroup, certified by the National Council for Scientific and TechnologicalDevelopment (CNPq), at the Federal University of Campina Grande, fortheir collaboration in many ways, helpful discussions and friendship, aswell as our colleagues at the Institute for Advanced Studies in Communi-cations (Iecom).

The author also wishes to acknowledge the contribution of profes-sors Francisco Madeiro, from the State University of Pernambuco, andWaslon T. A. Lopes, from the Federal University of Campina Grande,Brazil, to the chapter on source coding.

The author is also grateful to professor Valdemar Cardoso da Rocha Jr.,from the Federal University of Pernambuco, Brazil, for technical commu-nications, long-term cooperation, and useful discussions related to infor-mation theory.

The author is indebted to his wife Silvana, sons Thiago and Raphael,and daughter Marcella, for their patience and support during the course ofthe preparation of this book.

The author is thankful to professor Orlando Baiocchi, from the Uni-versity of Washington, Tacoma, USA, who strongly supported this projectfrom the beginning and helped with the reviewing process.

Finally, the author registers the support of Shoshanna Goldberg,Destiny Hadley, Charlene Kronstedt, Jyothi, and Millicent Treloar fromMomentum Press, in the book preparation process.

xix

CHAPTER 1

INFORMATION THEORY

Information Theory is a branch of Probability Theory, which has applica-tion and correlation with many areas, including communication systems,communication theory, Physics, language and meaning, cybernetics, psy-chology, art, and complexity theory (Pierce 1980). The basis for the theorywas established by Harry Theodor Nyqvist (1889–1976) (Nyquist 1924),also known as Harry Nyquist, and RalphVinton Lyon Hartley (1888–1970),who invented the Hartley oscillator (1928). They published the first arti-cles on the subject, in which the factors that influenced the transmission ofinformation were discussed.

The seminal article by Claude E. Shannon (1916–2001) extended thetheory to include new factors, such as the noise effect in the channel andthe savings that could be obtained as a function of the statistical struc-ture of the original message and the information receiver characteristics(Shannon 1948). Shannon defined the fundamental communication prob-lem as the possibility of, exactly or approximately, reproducing, at a certainpoint, a message that has been chosen at another one.

The main semantic aspects of the communication, initially establishedby Charles Sanders Peirce (1839–1914), a philosopher and creator of Semi-otic Theory, are not relevant for the development of the Shannon informa-tion theory. What is important is to consider that a particular message isselected from a set of possible messages.

Of course, as mentioned by John Robinson Pierce (1910–2002), quotingthe philosopher Alfred Jules Ayer (1910–1989), it is possible to communi-cate not only information, but knowledge, errors, opinions, ideas, experi-ences, desires, commands, emotions, feelings. Heat and movement can becommunicated, as well as, force, weakness, and disease (Pierce 1980).

2 • INFORMATION THEORY

Hartley has found several reasons why the natural logarithm shouldmeasure the information:

• It is a practical metric in Engineering, considering that variousparameters, such as time and bandwidth, are proportional to thelogarithm of the number of possibilities.

• From a mathematical point of view, it is as adequate measure,because several limit operations are simply stated in terms oflogarithms.

• It has an intuitive appeal, as an adequate metric, because, forinstance, two binary symbols have four possibilities of occurrence.

The choice of the logarithm base defines the information unit. If base 2 isused, the unit is the bit, an acronym suggested by John W. Tukey for binarydigit, which is a play of words that can also mean a piece of information.The information transmission is informally given in bit(s), but a unit hasbeen proposed to pay tribute to the scientist who developed the concept, itis called the shannon, or [Sh] for short. This has a direct correspondencewith the unit for frequency, hertz or [Hz], for cycles per second, which wasadopted by the International System of Units (SI).1

Aleksandr Yakovlevich Khinchin (1894–1959) put the InformationTheory in solid basis, with a more a precise and unified mathematicaldiscussion about the entropy concept, which supported Shannon’s intuitiveand practical view (Khinchin 1957).

The books by Robert B. Ash (1965) and Amiel Feinstein (1958) givethe mathematical reasons for the choice of the logarithm to measure infor-mation, and the book by J. Aczél and Z. Daróczy (1975) presents severalof Shannon’s information measures and their characterization, as well asAlfréd Rényi’s (1921–1970) entropy metric.

A discussion on generalized entropies can be found in the book editedby Luigi M. Ricciardi (1990). Lotfi Asker Zadeh introduced the conceptof fuzzy set, and efficient tool to represent the behavior of systems thatdepend on the perception and judgement of human beings, and applied itto information measurement (Zadeh 1965).

1.1 INFORMATION MEASUREMENT

The objective of this section is to establish a measure for the informationcontent of a discrete system, using Probability Theory. Consider a discreterandom experiment, such as the occurrence of a symbol, and its associatedsample space �, in which X is a real random variable (Reza 1961).

INFORMATION THEORY • 3

The random variable X can assume the following values

X ={x1, x2, . . . , xn},

in whichN⋃

k=1

xk =�, (1.1)

with probabilities in the set P

P={p1, p2, . . . , pn},

in whichN∑

k=1

pk = 1. (1.2)

The information associated to a particular event is given by

I (xi)= log(

1

pi

), (1.3)

because the sure event has probability one and zero information, by aproperty of the logarithm, and the impossible event has zero probabilityand infinite information.

Example: suppose the sample space is partitioned into two equally prob-able spaces. Then

I (x1)= I (x2)=− log 12 = 1bit, (1.4)

that is, the choice between two equally probable events requires one unitof information, when a base 2 logarithm is used.

Considering the occurrence of 2N equiprobable symbols, then the self-information of each event is given by

I (xk )=− log pk =− log 2−N =Nbits. (1.5)

It is possible to define the source entropy, H (X ), as the average infor-mation, obtained by weighing of all the occurrences

H (X )=E[I (xi)]=−N∑

i=1

pi log pi. (1.6)

Observe that Equation 1.6 is the weighing average of the logarithmsof the probabilities, in which the weights are the real values of the prob-abilities of the random variable X , and this indicates that H (X ) can beinterpreted as the expected value of the random variable that assumes thevalue log pi, with probability pi (Ash 1965).

4 • INFORMATION THEORY

Table 1.1. Symbol probabilities of atwo-symbol source

Symbol Probability

x114

x234

Example: consider a source that emits two symbols, with unequal proba-bilities, given in Table 1.1.

The source entropy is calculated as

H (X )=−1

4log

1

4− 3

4log

3

4= 0.81 bits per symbol.

1.2 REQUIREMENTS FOR AN INFORMATIONMETRIC

A few fundamental properties are necessary for the entropy in order toobtain an axiomatic approach to base the information measurement (Reza1961).

• If the event probabilities suffer a small change, the associatedmeasure must change in accordance, in a continuous manner, whichprovides a physical meaning to the metric

H (p1, p2, . . . , pN ) is continuous in pk , k = 1, 2, . . . , N ,

0≤ pk ≤ 1. (1.7)

• The information measure must be symmetric in relation to the prob-ability set P. That is, the entropy is invariant to the order of events.

H (p1, p2, p3, . . . , pN )=H (p1, p3, p2, . . . , pN ). (1.8)

• The maximum of the entropy is obtained when the events are equallyprobable. That is, when nothing is known about the set of events,or about what message has been produced, the assumption of auniform distribution gives the highest information quantity, thatcorresponds to the highest level of uncertainty.

Maximum of H (p1, p2, . . . , pN )=H

(1

N,

1

N, . . . ,

1

N

). (1.9)

INFORMATION THEORY • 5

Table 1.2. Identically distributedsymbol probabilities

Symbol Probability

x114

x214

x314

x414

Table 1.3. Unequal symbol proba-bilities

Symbol Probability

x112

x214

x318

x418

Example: consider two sources that emit four symbols. The firstsource symbols, shown in Table 1.2, have equal probabilities, andthe second source symbols, shown in Table 1.3, are produced withunequal probabilities.The mentioned property indicates that the first source attains thehighest level of uncertainty, regardless of the probability values ofthe second source, as long as they are different.

• Consider that and adequate measure for average uncertainty wasfound H (p1, p2, . . . , pN ) associated to a set of events. Assume thatevent {xN } is divided into M disjoint sets, with probabilities qk ,such that

pN =M∑

k=1

qk , (1.10)

and the probabilities associated to the new events can be normalizedin such a way that

q1

pn+ q2

pn+ · · · + qm

pn= 1. (1.11)

6 • INFORMATION THEORY

Then, the creation of new events from the original set modifies theentropy to

H (p1, p2, . . . , pN−1, q1, q2, . . . , qM )=H (p1, . . . , pN−1, pN )

+ pN H

(q1

pN,

q2

pN, . . . ,

qM

pN

), (1.12)

with

pN =M∑

k=1

qk .

It is possible to show that the function defined by Equation 1.6 satisfiesall requirements. To demonstrate the continuity, it suffices to do (Reza1961)

H (p1, p2, . . . , pN )= − [p1 log p1 + p2 log p2 + · · · + pN log pN ]

= − [p1 log p1 + p2 log p2 + · · · pN−1 log pN−1

+ (1− p1 − p2 − · · · − pN−1)

· log (1− p1 − p2 − · · · − pN−1)] . (1.13)

Notice that, for all independent random variables, the set of probabilitiesp1, p2, . . . , pN−1 and also (1− p1 − p2 − . . .− pN−1) are contiguous in[0, 1], and that the logarithm of a continuous function is also continuous.The entropy is clearly symmetric.

The maximum value property can be demonstrated, if one considersthat all probabilities are equal and that the entropy is maximized by thatcondition

p1= p2= · · ·= pN . (1.14)

Taking into account that according to intuition the uncertainty ismaximum for a system of equiprobable states, it is possible to arbitrar-ily choose a random variable with probability pN depending on pk , andk = 1, 2, . . . , N − 1. Taking the derivative of the entropy in terms of eachprobability

dH

dpk=

N∑i=1

∂H

∂pi

∂pi

∂pk

= − d

dpk(pk log pk )− d

dpN(pN log pN )

∂pN

∂pk. (1.15)

But, probability pN can be written as

pN = 1− (p1 + p2 + · · · + pk + · · · + pN−1). (1.16)

INFORMATION THEORY • 7

Therefore, the derivative of the entropy is

dH

dpk=−( log2 e + log pk )+ ( log2 e + log pn), (1.17)

that, using a property of logarithms, simplifies to,

dH

dpk=− log

pk

pn. (1.18)

But,dH

dpk= 0, which gives pk = pN . (1.19)

As pk was chosen in an arbitrary manner, one concludes that to obtaina maximum for the entropy function, one must have

p1= p2= · · ·= pN = 1

N. (1.20)

The maximum is guaranteed because

H (1, 0, 0, . . . , 0)= 0. (1.21)

On the other hand, for equiprobable events, it is possible to verify thatthe entropy is always positive, for it attains its maximum at (Csiszár andKórner 1981)

H

(1

N,

1

N, . . . ,

1

N

)= log N > 0. (1.22)

To prove additivity, it suffices to use the definition of entropy,computed for a two set partition, with probabilities {p1, p2, . . . , pN−1} and{q1, q2, . . . , qM },

H (p1, p2, . . . , pN−1, q1, q2, . . . , qM )

= −N−1∑k=1

pk log pk −M∑

k=1

qk log qk

= −N∑

k=1

pk log pk + pN log pN −M∑

k=1

qk log qk

= H (p1, p2, . . . , pN )+ pN log pN

−M∑

k=1

qk log qk . (1.23)

8 • INFORMATION THEORY

But, the second part of the last term can be written in a way to displaythe importance of the entropy in the derivation

pN log pN −M∑

k=1

qk log qk = pN

M∑k=1

qk

pNlog pN −

M∑k=1

qk log qk

= − pN

M∑k=1

qk

pNlog

qk

pN

= pN H

(q1

pN,

q2

pN, . . . ,

qM

pN

), (1.24)

and this demonstrates the mentioned property.The entropy is non-negative, which guarantees that the partitioning of

one event into several other events does not reduce the system entropy, asshown in the following

H (p1, . . . , pN−1, q1, q2, . . . , qM )≥H (p1, . . . , pN−1, pN ), (1.25)

that is, if one splits a symbol into two or more, the entropy always increases,and that is the physical origin of the word.

Example: consider a binary source, X , that emits symbols 0 and 1 withprobabilities p and q= 1− p. The average information per symbol is givenby H (X )=−p log p− q log q, that is known as entropy function.

H (p)=−p log p− (1− p) log (1− p). (1.26)

Example: for the binary source, consider that the symbol probabilitiesare p= 1/8 and q= 7/8, and compute the entropy of the source.

The average information per symbol is given by

H (X )=−1/8 log 1/8− 7/8 log 7/8,

which gives H (X )= 0.544.Note that even though 1 bit is produced for each symbol, the actual

average information is 0.544 bits due to the unequal probabilities.The entropy function has a maximum, when all symbols are equiprob-

able, of p= q= 1/2, for which the entropy is 1 bit/symbol. The functionattains a minimum of p= 0 or p= 1.

This function plays an essential role in determining the capacity of abinary symmetric channel. Observe that the entropy function is concave,that is

H (p1)+ H (p2)≤ 2H

(p1 + p2

2

). (1.27)

INFORMATION THEORY • 9

1

1

H(p)

0 p

Figure 1.1. Graph of an information function.

The entropy function is illustrated in Figure 1.1, in which it is possi-ble to notice the symmetry, concavity, and the maximum for equiprobablesymbols. As consequence of the symmetry, the sample spaces with proba-bility distributions obtained from permutations of a common distributionprovide the same information quantity (van der Lubbe 1997).

Example: consider a source that emits symbols from an alphabet X ={x1, x2, x3, x4} with probabilities given in Table 1.4. What is the entropy ofthis source?

The entropy is computed using Formula 1.6, for N = 4 symbols, as

H (X )=−4∑

i=1

pi log pi,

or

H (X )=−1

2log

1

2− 1

4log

1

4− 2

8log

1

8= 1.75 bits per symbol.

Table 1.4. Symbol probabilities ofa certain source

Symbol Probability

x112

x214

x318

x418

10 • INFORMATION THEORY

NOTES

1 The author of this book proposed the adoption of the shannon [Sh]unit during the IEEE International Conference on Communications(ICC 2001), in Helsinki, Finland, shortly after Shannon’s death.

INDEX

AAbsolutely secure event, 124Algebra, 127, 129

Borel, 130closure, 129

Algebraic coding theory, 77Alzheimer, Aloysius, 32Appearance

equivocation, 121Asymmetric key, 120Axiom

choice, 126Peano, 126specification, 126

Axioms, 126Ayer, Alfred Jules,

BBelonging, 126Binary

Huffman, 27Bit, 17Bletchley Park, 117Borel

algebra, 130Borel, Félix Edouard Juston

Émile, 13BSC, 4

CCantor, Georg, 125Capacity

bound, 107channel, 41sum, 94

Cardinalnumber, 130numbers, 125

Cardinality, 130Carrier suppression, 80CDMA, 72, 76, 87

non-cooperative, 89performance analysis, 76,

90Channel

binary symmetric, 44communication, 34discrete, 43independent output, 36noiseless, 35noisy, 43non-cooperative, 89

Ciphertextentropy, 121

Clausius, Rudolph, 31Closure, 129CMOS, 17Code

chip rate, 88classification, 19comma, 21decision tree, 14

149

1

04

150 • INDEX

decoding tree, 15Huffman, 27instantaneous, 21, 23prefix, 14, 16

extended, 17uniquely decodable, 20

Codercodewords, 1source, 13

Codesblock, 19non-singular, 19

Codewords, 11Coding

efficiency, 12source, 11

Communicationspersonal, 71

Computer network, 118Cryptography, 72, 118

history, 117information theory, 121model, 119principles, 120uncertainty, 121

Cryptosystemabsolute security, 124information theory, 123

DDedekind, J. W. R., 125Direct sequence, 72Disjoint, 126DPSK, 87DS, 72DSSS, 78, 87

EEfficiency, 43Empty, 126Enigma, 117Entropy,

conditional, 34cryptography, 120cryptosystem, 122extended source, 12inequality, 37joint, 33, 39properties, 6relations, 37

Error probability, 109Extension

source, 12

FFading, 72Families, 128Fano, Robert, 27FDMA, 76FH, 72Fractals, 125Frequency hopping, 72Fuzzy set, 2

GGold sequence, 82

HHartley, Ralph Vinton Lyon, 1,

31Hilbert, David, 125Huffman

non-unicity, 29procedure, 28

Huffman, David, 27

IInclusion, 126Indexing, 129Inequality

Kraft, 24Kraft-McMillan, 16

Informationaverage mutual, 39

1

3

INDEX • 151

channel, 34entropy, 2, 3joint, 32measure of, 2mutual,semantics, 1theory, 1

Internetbackbone, 118protocol, 1

JJammer, 74

broad band noise, 74partial band, 74partial time, 74repeater, 74tone, 74

Jamming, 72Joint entropy

cryptosystem, 122Joint random Variables, 136

KKasami

sequence, 84Key

entropy, 120equivocation, 121

Khinchin, AleksandrYakovlevich, 2

Kolmogorov, AndreiNikolaevich, 31, 131

Kraftinequality, 24

Kraft-McMillaninequality, 16

Kronecker, Leopold, 125

LLaser, 18Lebesgue, Henri Léon, 40

MMAI, 78Matched filter, 91Maximal length

linear sequence, 80sequence, 79

Maximum likelihood receiver,91

Measure, 126Lebesgue, 40probability, 126

Messageequivocation, 121uncertainty, 121

Moments, 134Multipath, 72, 88Multiple access, 71

DSSS, 78interference, 78

NNon-cooperative channel, 89Non-repudiation, 118Nyquist, Harry, 1, 31

PPassword, 118PCN, 87Peano

axiom, 126Peirce, Charles Sanders, 1Personal communications, 71Photon, 17Pierce, John Robinson, 1Plaintext

entropy, 120Prefix

code, 14, 16Probability, 126

joint random variables, 136moments, 134random variables, 13

38

18

3

152 • INDEX

Probability theory, 133Pseudo-random, 80Public key infrastructure,

118

RRényi, Alfréd, 2Radiation

electromagnetic, 18Random variables, 13Rate

transmission, 43Redundancy, 12

absolute, 42relative, 43

SSchröder-Bernstein

theorem, 126Sequence

Gold, 82Kasami, 84maximal length, 81maximal-length, 79, 80Mersenne, 81Welsh bound, 82

Sequence design, 79Session

hijacking, 118Set, 125

algebra, 127disjoint, 126families, 128fuzzy, 2infinite, 125operations, 127universal, 126universal set, 125

Set theory, 125, 126Sets

algebra, 129indexing, 129

ShannonClaude Elwood, 39, 41first theorem, 12

Shannon, Claude Elwood,

Signalspace, 74

Sniffing, 118Source

efficiency, 12extended, 12

Source codingtheorem, 12

Spectralcompression, 75

Spoofing, 118Spread spectrum, 71

m-sequence, 79carrier suppression, 8direct sequence, 72DSSS, 87frequency hopping, 72interference, 77performance Analysis, 76pseudo-random, 80sequence design, 79time hopping, 73time-frequency hopping,

Substitution, 120Sum capacity, 94Symmetric key, 120

TTCP/IP, 118TDMA, 76TH, 73Theorem

Schröder-Bernstein, 126Time hopping, 73Transfinite arithmetic, 12

3

1, 31

0

78

5

INDEX • 153

Transformationcryptography, 119

Transposition, 120TTL, 17Turing, Alan Mathison, 117

UUniversal, 126University of Halle, 125

VVectorial

space, 74

Venndiagram, 126

Venn diagram, 126

WWelsh bound, 82

ZZadeh, Lotfi Asker, 2Zenon, 125Zorn

lemma, 126Zorn’s lemma, 126