the origin of entropy

39
The Origin of Entropy Rick Chang

Upload: sancho

Post on 20-Feb-2016

31 views

Category:

Documents


3 download

DESCRIPTION

The Origin of Entropy. Rick Chang. Agenda. Introduction Reference What is information? A straight forward way to derive the form of entropy A mathematical way to derive the form of entropy Conclusion. Introduction. We use entropy matrices - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Origin of Entropy

The Origin of EntropyRick Chang

Page 2: The Origin of Entropy

TEIL

@ N

TU

Agenda• Introduction• Reference• What is information?• A straight forward way to derive the form of

entropy• A mathematical way to derive the form of

entropy• Conclusion

2

Page 3: The Origin of Entropy

TEIL

@ N

TU

Introduction• We use entropy matrices

to measure dependencies of any pairs of genes, but why ?

• What is entropy?

3

Page 4: The Origin of Entropy

TEIL

@ N

TU

Introduction – cont.• I will :

try to explain what information, entropy are

• I will not :tell you how entropy is related to GA - I don’t know (may be a future work)

4

Page 5: The Origin of Entropy

TEIL

@ N

TU

References• A mathematical theory of communication

By C.E. Shannon 1949 part I , Appendix 2

• Information theory, Inference, and learning algorithms

By David J.C MacKay 2003 chapter 1, 4

• Information theory and reliable communication

By Robert G. Gallager 1976 chapter 2

5

Page 6: The Origin of Entropy

TEIL

@ N

TU

Shannon 1916 ~ 2001

6

Page 7: The Origin of Entropy

TEIL

@ N

TU

What is information?• Ensemble • The outcome x is the value of a random

variable, which takes on one of a set of possible values,

having probabilities

with and

7

( ) 1i x

ia AP x a

Page 8: The Origin of Entropy

TEIL

@ N

TU

What is information?

8

Page 9: The Origin of Entropy

TEIL

@ N

TU

What is information?

9

• Hartley R. V. L. “Transmission of Information “ :If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely.

Page 10: The Origin of Entropy

TEIL

@ N

TU

A Straight forward way• When we try to measure the influence of

event y to event x, we may consider

10

> 1 : when occurrence of event y increase our belief of event x

= 1 :event x and y are independent

< 1

Page 11: The Origin of Entropy

TEIL

@ N

TU

A Straight forward way – cont.• We define the information provided about the

event x by the occurrence of event y is

11

> 0 : when appearance of event y increase our belief of event x

= 0 : event x and y are independent

< 0

Page 12: The Origin of Entropy

TEIL

@ N

TU

Why use logarithmic?• More convenient1. practically more useful

2. nearer to our intuitive feeling

we intuitively measures entities by linear comparison

3. mathematically more suitableMany of the limiting operations are simple in terms of the

logarithm

12

Page 13: The Origin of Entropy

TEIL

@ N

TU

Mutual information

= I (y ; x)

13

Mutual information between event x and event y

Page 14: The Origin of Entropy

TEIL

@ N

TU

Mutual information – cont.• Mutual information => use logarithmic to quantify the difference between the belief of event x given event y and the belief of event x

=> the amount of uncertainty of event x we can resolve after the occurrence of event y

14

Page 15: The Origin of Entropy

TEIL

@ N

TU

Self-information• Consider an event y, p(x | y) = 1

=> the amount of uncertainty of event x we resolve after we know event x will certainly occur

=> the priori uncertainty of the event x

• Define Self-information of event x

15

Page 16: The Origin of Entropy

TEIL

@ N

TU

Intuitively

16

We know everything about the system

Our priori knowledge about event x

Information about the system

Page 17: The Origin of Entropy

TEIL

@ N

TU

Intuitively – cont.

17

We know everything about the system

Our priori knowledge about event x

After we know event x will certainly occur

Information about the system

Page 18: The Origin of Entropy

TEIL

@ N

TU

Intuitively – cont.

18

Information of event x

Information about the system

Uncertainty of event x

Page 19: The Origin of Entropy

TEIL

@ N

TU

Conditional Self-information• Same, define conditional self-information of

event x, given the occurrence of event y

• We now have

19

( | )( ; ) log( ) log( ( | )) log( ( ))( )

( ) ( | )

p x yI x y p x y p xp x

I x I x y

Page 20: The Origin of Entropy

TEIL

@ N

TU

Intuitively – cont.

20

We know everything about event x (we know event x will certainly occur)

Our priori knowledge about event x

After the occurrence of event y

Information about event x

Page 21: The Origin of Entropy

TEIL

@ N

TU

Intuitively – cont.

21

Mutual Information between event x and event y

Information about event x

Page 22: The Origin of Entropy

TEIL

@ N

TU

A Straight Forward Way – cont. • Like above, define self-information of event x and event y

• We now have

22

( , ) ( ) ( | )I x y I x I y x ( )( | )( )

p x yp y xp x

( , ) ( | ) ( ) ( ) ( ) ( ; )I x y I y x I x I x I y I x y ( ; ) ( ) ( | )I x y I y I y x

Page 23: The Origin of Entropy

TEIL

@ N

TU

A Straight Forward Way – cont.

• The uncertainty of event y is never increased by knowledge of x

23

( ) ( ) ( , ) ( ) ( | )I x I y I x y I x I y x

( ) ( | )I y I y x

Page 24: The Origin of Entropy

TEIL

@ N

TU

From instance to expectation• I(x;y)

• I(x)

• I(x|y)

• I(x,y)

• I(x;y)=I(x)-I(x|y)

• I(x,y)=I(x)+I(y)-I(x;y) 24

• I(X;Y)

• H(X)

• H(X|Y)

• H(X,Y)

• I(X;Y)=H(X)-H(X|Y)

• H(X,Y)=H(X)+H(Y)-I(X;Y)

Average

Page 25: The Origin of Entropy

TEIL

@ N

TU

Relationship

25

H(X,Y)

H(X)

H(Y)

H(X|Y) I(X;Y) H(Y|X)

Page 26: The Origin of Entropy

TEIL

@ N

TU

Entropy• The entropy of an ensemble is defined to be

the average value of the self-information of all event x

26

1

1( ) ( ) log( )

n

i

H X p xp x

Average priori uncertainty of an ensemble

Page 27: The Origin of Entropy

TEIL

@ N

TU

Interesting Properties of H(X)• H = 0 if and only if all the but one are zero,

this one having the value unity. Thus only when we are certain of the outcome does H vanish. Otherwise H is positive.

• For a given n, H is a maximum and equal to log(n) when all the are equal, i.e., . This is also intuitively the most uncertain situation.

• Any change toward equalization of the probabilities

, …, increases H. 27

Page 28: The Origin of Entropy

TEIL

@ N

TU

A mathematical way• Can we find a measure of how uncertain we

are of an ensemble ?

• If there is such a measure, say, it is reasonable to require of it the following properties:1. H should be continuous in the 2. If all the are equal, =1/n, then H should be a

monotonic increasing function of n. 3. If a choice be broken down into two successive

choices, the original H should be the weighted sum of the individual values of H. 28

Page 29: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.3. If a choice be broken down into two successive

choices, the original H should be the weighted sum of the individual values of H.

29

1 1 1 1 1 1 2 1 1( , , ) ( , ) ( , ) (1)2 3 6 2 2 2 3 3 2

H H H H

Second choice occurs half the

time

Page 30: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• Theorem: The only H satisfying the three above

properties is of the form:

30

1

1logn

ii i

H K pp

Page 31: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• Proof: Let From property(3) we can decompose a choice from equally likely possibilities into a series of m choices from s equally likely possibilities and obtain

31

1 1 1( , ,..., ) ( )H A nn n n

mA(s ) ( )mA s

𝑠𝑚 𝑠𝑠

𝑠m

A(s)

Page 32: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• Similarly• We can choose n arbitrarily large and find an m to

satisfy

32

nA(t ) ( )nA t

m n m+1s st

log log ( 1) loglog 1log

log , is arbitrarily small (1)log

m s n t m sm t mn s n

m tn s

Page 33: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• from the monotonic property of A(n)

33

m n m+1 A(s ) ( ) (s )(s) ( ) ( 1) (s)

( ) 1( )

( ) , is arbitrarily small (2)( )

A t AmA nA t m Am A t mn A s n

m A tn A s

Page 34: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• From equation (1) and (2)

• We get A(t) = K log(t) , K must be positive to satisfy property (2)

34

( ) log( ) 2 , is arbitrarily small( ) log( )A t tA s s

Page 35: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• Now suppose we have a choice from n possibilities

with commeasurable probabilities where all are integers.

• We can break down a choice from possibilities into a choice from n possibilities with probabilities and then, if the was chosen, a choice from with equal probabilities.

35𝑛

𝑛1

𝑛𝑖

∑𝑛𝑛𝑖 𝑛2

Page 36: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• Using property (3) again, we equate the total

choice from as computed by two methods

36

𝑛𝑛1

𝑛𝑛

∑𝑛𝑛𝑖 𝑛𝑖

1log ( ,..., ) ( log )i n i in n

K n H p p K p n

Page 37: The Origin of Entropy

TEIL

@ N

TU

A mathematical way – cont.• Hence

• If the pi are not commeasurable, they may be approximated by rational and the same expression must hold by our continuity assumption (property(1) ).

• The choice of coefficient K is a matter of convenience and amounts to the choice of a unit of measure.

37

1( ,..., )

1lo

[ log log ]

lo gg in

n i i i in n n

i

ni

i

n i

H p p K p n p n

K pp

nK pn

Page 38: The Origin of Entropy

TEIL

@ N

TU

Conclusion• We first use a intuitive method to measure

information content of an event or an ensemble• We explain why we choose logarithm

intuitively • Mutual information, entropy is introduced• We show the relationship between

information content and uncertainty• At last, we set three assumptions and derive

the only way to measure information content and show that logarithm must be adopted. 38

Page 39: The Origin of Entropy

TEIL

@ N

TU

Thanks39