unit iv source and error control coding

COURSE MATERIAL (LECTURE NOTES)

CS6304 - ADC, UNIT 4

UNIT IV

SOURCE AND ERROR CONTROL CODING

Information theory is a branch of applied mathematics, electrical engineering,

and computer science involving the quantification of information. Information theory was

developed by Claude E. Shannon to find fundamental limits on signal processing operations such

as compressing data and on reliably storing and communicating data.

Since its inception it has broadened to find applications in many other areas,

including statistical inference, natural language processing, cryptography, neurobiology, the

evolution and function of molecular codes, model selection in ecology thermal physics, quantum

computing, plagiarism detection and other forms of data analysis.

Uncertainity

It is impossible to know with unlimited accuracy the position and momentum of a particle. The

principle arises because in order to locate a particle exactly, an observer must bounce light (in

the form of a photon) off the particle, which must alter its position in an unpredictable way.

Entropy

It is defined as the average amount of information and it is denoted by H(S).

H(S)= pklog21/pk

Huffmann Coding Techniques

Huffman coding is an entropy encoding algorithm used for lossless data compression. The term

refers to the use of a variable-length code table for encoding a source symbol (such as a character

in a file)

A prefix-free binary code (a set of codewords) with minimum expected codeword length

(equivalently, a tree with minimum weighted path length).

SOURCE CODING

Smartzworld.com Smartworld.asia

jntuworldupdates.org Specworld.in

http://en.wikipedia.org/wiki/Applied_mathematics

http://en.wikipedia.org/wiki/Electrical_engineering

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Quantification_(science)

http://en.wikipedia.org/wiki/Information

http://en.wikipedia.org/wiki/Claude_E._Shannon

http://en.wikipedia.org/wiki/Signal_processing

http://en.wikipedia.org/wiki/Data_compression

http://en.wikipedia.org/wiki/Computer_data_storage

http://en.wikipedia.org/wiki/Telecommunication

http://en.wikipedia.org/wiki/Statistical_inference

http://en.wikipedia.org/wiki/Natural_language_processing

http://en.wikipedia.org/wiki/Cryptography

http://en.wikipedia.org/wiki/Neurobiology

http://en.wikipedia.org/wiki/Quantum_computing



http://en.wikipedia.org/wiki/Data_analysis

http://en.wikipedia.org/wiki/Prefix_code

http://en.wikipedia.org/wiki/Expected_value

http://en.wikipedia.org/w/index.php?title=Weighted_path_length&action=edit&redlink=1



Entropy

◦ It is defined as the average amount of information and it is denoted by H(S).

H(S)= pklog21/pk

Property of entropy

Entropy is bounded by

0 ≤ H(X) ≤ log2 K

The entropy, , of a discrete random variable is a measure of the amount of

uncertainty associated with the value of .

Suppose one transmits 1000 bits (0s and 1s). If these bits are known ahead of transmission (to be

a certain value with absolute probability), logic dictates that no information has been transmitted.

If, however, each is equally and independently likely to be 0 or 1, 1000 bits (in the information

theoretic sense) have been transmitted. Between these two extremes, information can be

quantified as follows. If is the set of all messages that could be, and

is the probability of some , then the entropy, , of is defined:[8]

(Here, is the self-information, which is the entropy contribution of an individual message,

and is the expected value.) An important property of entropy is that it is maximized when

all the messages in the message space are equiprobable ,—i.e., most

unpredictable—in which case

.

The special case of information entropy for a random variable with two outcomes is thebinary

entropy function, usually taken to the logarithmic base 2:

• Two types of coding

1)Fixed length code

2)Variable length code (Morse code)

• In morse code, letters and alphabets are encoded as dots‖.‖ and dashes‖-‖

Short code frequently occurring source symbol

We have seen that the Shannon entropy is a useful measure to deter-mine how best to

encode a set of source symbols and by how much we can compress it. We will now investigate

how much we can gain by encoding only a subset of the source symbols and accept a l ossy code.

To outline the idea we deÞne Þrst a simpler measure of information: The raw bit content of X is

H0 := log|AX| .



http://en.wikipedia.org/wiki/Entropy_(information_theory)

http://en.wikipedia.org/wiki/Information_theory#cite_note-Reza-8

http://en.wikipedia.org/wiki/Self-information


http://en.wikipedia.org/wiki/Binary_entropy_function





Example. Let

AX = {a, b, c, d, e, f, g, h}

and PX = {1/4, 1/4, 1/4, 3/16, 1/64, 1/64, 1/64, 1/64 .}

The raw bit content is 3 bits. But notice that P (x ∈ {a, b, c, d}) = 15/16. So, if we are willing to

run a risk of not having a code word 1/16 of the time we get by with a 2-bit code and hence save

1 bit in our encoding scheme.

δ = 0 x a b c d e f g h

c(x) 000 001 010 011 100 101 110 111

δ = 1/16 x a b c d e f g h

c(x) 00 01 10 11 Ð Ð Ð Ð

The smalles δ-sufficient subset Sδ is the smallest subset of AX

satisfying

P (x ∈ Sδ ) ≥ 1 − δ .

The subset is constructed by taking the element of largest proba-bility Þrst and then adding the

other elements in order of decreasing probability while the total probability is ≤ 1 − δ. Having

deÞned an acceptable error threshold we now deÞne a new measure of information content on

the δ-sufficient subset:

The essential bit content for a given 0 < δ < 1 of X is

(4.3) Hδ (X) := log|Sδ | .

Figure 1. The essential bit content for PX in Example 4.1.

Encode strings of symbols of length N with N increasingly large and the symbols i.i.d. using a

code with δ > 0. Note, that the entropy is additive for independent random variables, and hence

H(XN) = NH(X). Then, for larger and larger N, the essential bit content becomes more and more

independent of δ. Indeed, for large enough N, the essential bit content, Hδ (XN) will become

arbitrarily close to NH(X). This means that we can encode a given message with NH(X) bits if

we allow for a tiny but positive error. And even if we did not allow for any error, the encoding

would not be possible with fewer than NH(X) bits. This is the source coding theorem.





Huffman coding is an entropy encoding algorithm used for lossless data compression.

The term refers to the use of a variable-length code table for encoding a source symbol (such as a

character in a file)

Huffman coding uses a specific method for choosing the representation for each symbol,

resulting in a prefix code (sometimes called "prefix-free codesthat is, the bit string representing

some particular symbol is never a prefix of the bit string representing any other symbol) that

expresses the most common characters using shorter strings of bits than are used for less

common source symbol") (s.

Huffman was able to design the most efficient compression method of this type: no other

mapping of individual source symbols to unique strings of bits will produce a smaller average

output size when the actual symbol frequencies agree with those used to create the code.

Although Huffman coding is optimal for a symbol-by-symbol coding (i.e. a stream of

unrelated symbols) with a known input probability distribution, its optimality can sometimes

accidentally be over-stated. For example, arithmetic coding and LZW coding often have better

compression capability.

Given

A set of symbols and their weights (usually proportional to probabilities).

Find

A prefix-free binary code (a set of codewords) with minimum expected codeword length

(equivalently, a tree with minimum weighted path length).

Input.

Alphabet, which is the symbol alphabet of size n.

Set , which is the set of the (positive) symbol weights (usually proportional to probabilities), i.e. .

Output.

Code, which is the set of (binary) codeword, where ci is the codeword for .

Goal.

Let be the weighted path length of code C. Condition: for any code .

Input (A, W) Symbol (ai) a b c d e Sum

Weights (wi) 0.10 0.15 0.30 0.16 0.29 = 1

Output C

Codewords (ci) 000 001 10 01 11

Codeword length (in

bits)

(li)

3 3 2 2 2

Weighted path length 0.30 0.45 0.60 0.32 0.58 L(C) =



http://en.wikipedia.org/wiki/Proportionality_(mathematics)

http://en.wikipedia.org/wiki/Prefix_code


http://en.wikipedia.org/w/index.php?title=Weighted_path_length&action=edit&redlink=1



\

For any code that is biunique, meaning that the code is uniquely decodable, the sum of the

probability budgets across all symbols is always less than or equal to one. In this example, the

sum is strictly equal to one; as a result, the code is termed a complete code. If this is not the case,

you can always derive an equivalent code by adding extra symbols (with associated null

probabilities), to make the code complete while keeping it biunique.

In general, a Huffman code need not be unique, but it is always one of the codes

minimizing L(C).

Extended Huffman coding is the procedure of determining the optimal length of codewords for

blocks of two or more symbols

Example of extended Huffman code construction: forming the tree and assigning the code

words

(li wi ) 2.25

Optimality

Probability budget

(2-l

i)1/8 1/8 1/4 1/4 1/4 = 1.00

Information content

(in bits)

(−log2 wi) ≈

3.32 2.74 1.74 2.64 1.79

Entropy

(−wi log2 wi) 0.332 0.411 0.521 0.423 0.518

H(A) =

2.205





Probabilities of symbol blocks Huffman coding Length of the

code

words

P1,1,1=0.2778 10 2

P1,1,2=0.1479 00 2

P1,2,1=0.1479 01 2

P1,2,2=0.0788 11110 5

P2,1,1=0.1479 1110 4

P2,1,2=0.0788 1101 4

P2,2,1=0.0788 1100 4

P2,2,2=0.0421 11111 5

Extended Huffman codebook

D t1 p1 p2 H R ̅

(M=2)

R ̅

(M=3)

R ̅

(M=4)

R ̅

(M=5)

0.6297 1.2 0.8849 0.1150 0.5149 0.6658 0.5622 0.5313 0.5221

0.5953 1.1 0.8643 0.1356 0.5727 0.6941 0.6027 0.5847 0.5768

0.5614 1.0 0.8413 0.1586 0.6313 0.7252 0.6477 0.6407 0.6363

0.5286 0.9 0.8159 0.1840 0.6889 0.7589 0.6970 0.6966 0.6933

0.4974 0.8 0.7881 0.2118 0.7450 0.7951 0.7504 0.7532 0.7510

0.4684 0.7 0.7580 0.2419 0.7983 0.8335 0.8075 0.8063 0.8006

0.4421 0.6 0.7257 0.2742 0.8475 0.8736 0.8678 0.8534 0.8507

0.4190 0.5 0.6915 0.3084 0.8914 0.9149 0.9159 0.9039 0.8976

0.3995 0.4 0.6554 0.3445 0.9291 0.9573 0.9455 0.9410 0.9322

0.3839 0.3 0.6179 0.3820 0.9595 0.9998 0.9697 0.9718 0.9635

0.3726 0.2 0.5793 0.4206 0.9818 0.9998 0.9939 0.9900 0.9883





0.3657 0.1 0.5398 0.4601 0.9955 0.9998 0.9997 0.9996 0.9993

0.3634 0.0 0.5 0.5 1 1 1 1 1

Channel Capacity Theorem

The Shannon theorem states that given a noisy channel with channel capacity C and

information transmitted at a rate R, then if R < C there exist codes that allow the probability of

error at the receiver to be made arbitrarily small. This means that theoretically, it is possible to

transmit information nearly without error at any rate below a limiting rate, C.

The converse is also important. If R > C, an arbitrarily small probability of error is not

achievable. All codes will have a probability of error greater than a certain positive minimal

level, and this level increases as the rate increases. So, information cannot be guaranteed to be

transmitted reliably across a channel at rates beyond the channel capacity. The theorem does not

address the rare situation in which rate and capacity are equal.

The channel capacity C can be calculated from the physical properties of a channel; for a band-

limited channel with Gaussian

Channel Models

• Channel models play a crucial role in designing and developing multimedia applications

– For example, it is important to understand the impact of channel impairments on

compressed data that are transmitted over these channels.

• Simple channel models that represent ―links‖ can serve as the basis for developing more

comprehensive models for ―routes‖, ―paths‖, and ―networks

Channels, in general, can be classified into ―memoryless channels‖ and ―channels with

memory‖

• In particular, we focus on models that are based on the Discrete Memoryless Channel (DMC)

model

Popular DMC models include:

– The Binary Erasure Channel (BEC)

– The Binary Symmetric Channel (BSC)

• Extensions of these channels are also used to Extensions of these channels are also used to

model more realistic scenarios in multimedia applications

Binary Erasure Channel (BEC)

• The Binary Erasure Channel (BEC) model (and its extensions) are widely used to represent

channels or links that ―losses‖ data.

• Prime examples of such channels are Internet links and routes.

• A BEC channel has a binary input X and a ternary output Y.





Binary Symmetric Channel (BSC)

• The Binary Symmetric Channel (BSC) model (and its extensions) are widely used to represent

channels or links that exhibit errors.

• Example of such channels include wireless links and low-quality wired channels

• A BSC channel has a binary input X and a binary

output Y.

Shannon capacity of a graph

If G is an undirected graph, it can be used to define a communications channel in which

the symbols are the graph vertices, and two codeword may be confused with each other if their

symbols in each position are equal or adjacent. The computational complexity of finding the

Shannon capacity of such a channel remains open, but it can be upper bounded by another

important graph invariant, the Lovász number.

Noisy-channel coding theorem

The noisy-channel coding theorem states that for any ε > 0 and for any rate R less than the

channel capacity C, there is an encoding and decoding scheme that can be used to ensure that the

probability of decoding error is less than ε for a sufficiently large block length. Also, for any rate

greater than the channel capacity, the probability of error at the receiver goes to one as the block

length goes to infinity.

Example application

An application of the channel capacity concept to an additive white Gaussian noise (AWGN)

channel with B Hz bandwidthand signal-to-noise ratio S/N is the Shannon–Hartley theorem:

C is measured in bits per second if the logarithm is taken in base 2, or nats per second if

the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are

measured in watts or volts2, so the signal-to-noise ratio here is expressed as a power



http://en.wikipedia.org/wiki/Undirected_graph

http://en.wikipedia.org/wiki/Lov%C3%A1sz_number

http://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

http://en.wikipedia.org/wiki/Information_theory#Rate

http://en.wikipedia.org/wiki/Additive_white_Gaussian_noise

http://en.wikipedia.org/wiki/Bandwidth_(signal_processing)

http://en.wikipedia.org/wiki/Signal-to-noise_ratio

http://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem

http://en.wikipedia.org/wiki/Bits_per_second

http://en.wikipedia.org/wiki/Logarithm

http://en.wikipedia.org/wiki/Nat_(unit)

http://en.wikipedia.org/wiki/Natural_logarithm

http://en.wikipedia.org/wiki/Hertz



ratio, not in decibels (dB); since figures are often cited in dB, a conversion may be needed. For

example, 30 dB is a power ratio of .

Channel capacity in wireless communications

This section focuses on the single-antenna, point-to-point scenario. For channel capacity in

systems with multiple antennas, see the article on MIMO.

AWGN channel

If the average received power is [W] and the noise power spectral density is [W/Hz], the

AWGN channel capacity is

[bits/s],

where is the received signal-to-noise ratio (SNR). This result is known as

the Shannon–Hartley theorem

When the SNR is large (SNR >> 0 dB), the capacity is logarithmic in power and approximately

linear in bandwidth. This is called the bandwidth-limited regime.

When the SNR is small (SNR << 0 dB), the capacity is linear in power but insensitive to

bandwidth. This is called the power-limited regime.

The bandwidth-limited regime and power-limited regime are illustrated in the figure.

AWGN channel capacity with the power-limited regime and bandwidth-limited regime

indicated. Here,

Frequency-selective channel

The capacity of the frequency-selective channel is given by so-called water fillingpower

allocation,



http://en.wikipedia.org/wiki/Decibel

http://en.wikipedia.org/wiki/MIMO

http://en.wikipedia.org/wiki/Power_spectral_density

http://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem

http://en.wikipedia.org/wiki/Fading

http://en.wikipedia.org/wiki/Water_filling_algorithm

http://en.wikipedia.org/wiki/File:Channel_Capacity_with_Power-_and_Bandwidth-Limited_Regimes.png



where and is the gain of subchannel , with

chosen to meet the power constraint.

Slow-fading channel

In a slow-fading channel, where the coherence time is greater than the latency requirement, there

is no definite capacity as the maximum rate of reliable communications supported by the

channel, , depends on the random channel gain , which is unknown

to the transmitter. If the transmitter encodes data at rate [bits/s/Hz], there is a non-zero

probability that the decoding error probability cannot be made arbitrarily small,

,

in which case the system is said to be in outage. With a non-zero probability that the channel is

in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is

possible to determine the largest value of such that the outage probability is less than .

This value is known as the -outage capacity.

Fast-fading channel

In a fast-fading channel, where the latency requirement is greater than the coherence time and the

codeword length spans many coherence periods, one can average over many independent

channel fades by coding over a large number of coherence time intervals. Thus, it is possible to

achieve a reliable rate of communication of [bits/s/Hz] and it is

meaningful to speak of this value as the capacity of the fast-fading channel.

Channel Capacity Theorem

The Shannon theorem states that given a noisy channel with channel capacity C and information

transmitted at a rate R, then if R < C there exist codes that allow the probability of error at the

receiver to be made arbitrarily small. This means that theoretically, it is possible to transmit

information nearly without error at any rate below a limiting rate, C.

The converse is also important. If R > C, an arbitrarily small probability of error is not

achievable. All codes will have a probability of error greater than a certain positive minimal

level, and this level increases as the rate increases. So, information cannot be guaranteed to be

transmitted reliably across a channel at rates beyond the channel capacity. The theorem does not

address the rare situation in which rate and capacity are equal.

The channel capacity C can be calculated from the physical properties of a channel; for a band-

limited channel with Gaussian





unit iv source and error control coding

Documents