the origin of entropy
DESCRIPTION
The Origin of Entropy. Rick Chang. Agenda. Introduction Reference What is information? A straight forward way to derive the form of entropy A mathematical way to derive the form of entropy Conclusion. Introduction. We use entropy matrices - PowerPoint PPT PresentationTRANSCRIPT
The Origin of EntropyRick Chang
TEIL
@ N
TU
Agenda• Introduction• Reference• What is information?• A straight forward way to derive the form of
entropy• A mathematical way to derive the form of
entropy• Conclusion
2
TEIL
@ N
TU
Introduction• We use entropy matrices
to measure dependencies of any pairs of genes, but why ?
• What is entropy?
3
TEIL
@ N
TU
Introduction – cont.• I will :
try to explain what information, entropy are
• I will not :tell you how entropy is related to GA - I don’t know (may be a future work)
4
TEIL
@ N
TU
References• A mathematical theory of communication
By C.E. Shannon 1949 part I , Appendix 2
• Information theory, Inference, and learning algorithms
By David J.C MacKay 2003 chapter 1, 4
• Information theory and reliable communication
By Robert G. Gallager 1976 chapter 2
5
TEIL
@ N
TU
Shannon 1916 ~ 2001
6
TEIL
@ N
TU
What is information?• Ensemble • The outcome x is the value of a random
variable, which takes on one of a set of possible values,
having probabilities
with and
7
( ) 1i x
ia AP x a
TEIL
@ N
TU
What is information?
8
TEIL
@ N
TU
What is information?
9
• Hartley R. V. L. “Transmission of Information “ :If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely.
TEIL
@ N
TU
A Straight forward way• When we try to measure the influence of
event y to event x, we may consider
10
> 1 : when occurrence of event y increase our belief of event x
= 1 :event x and y are independent
< 1
TEIL
@ N
TU
A Straight forward way – cont.• We define the information provided about the
event x by the occurrence of event y is
11
> 0 : when appearance of event y increase our belief of event x
= 0 : event x and y are independent
< 0
TEIL
@ N
TU
Why use logarithmic?• More convenient1. practically more useful
2. nearer to our intuitive feeling
we intuitively measures entities by linear comparison
3. mathematically more suitableMany of the limiting operations are simple in terms of the
logarithm
12
TEIL
@ N
TU
Mutual information
= I (y ; x)
13
Mutual information between event x and event y
TEIL
@ N
TU
Mutual information – cont.• Mutual information => use logarithmic to quantify the difference between the belief of event x given event y and the belief of event x
=> the amount of uncertainty of event x we can resolve after the occurrence of event y
14
TEIL
@ N
TU
Self-information• Consider an event y, p(x | y) = 1
=> the amount of uncertainty of event x we resolve after we know event x will certainly occur
=> the priori uncertainty of the event x
• Define Self-information of event x
15
TEIL
@ N
TU
Intuitively
16
We know everything about the system
Our priori knowledge about event x
Information about the system
TEIL
@ N
TU
Intuitively – cont.
17
We know everything about the system
Our priori knowledge about event x
After we know event x will certainly occur
Information about the system
TEIL
@ N
TU
Intuitively – cont.
18
Information of event x
Information about the system
Uncertainty of event x
TEIL
@ N
TU
Conditional Self-information• Same, define conditional self-information of
event x, given the occurrence of event y
• We now have
19
( | )( ; ) log( ) log( ( | )) log( ( ))( )
( ) ( | )
p x yI x y p x y p xp x
I x I x y
TEIL
@ N
TU
Intuitively – cont.
20
We know everything about event x (we know event x will certainly occur)
Our priori knowledge about event x
After the occurrence of event y
Information about event x
TEIL
@ N
TU
Intuitively – cont.
21
Mutual Information between event x and event y
Information about event x
TEIL
@ N
TU
A Straight Forward Way – cont. • Like above, define self-information of event x and event y
• We now have
22
( , ) ( ) ( | )I x y I x I y x ( )( | )( )
p x yp y xp x
( , ) ( | ) ( ) ( ) ( ) ( ; )I x y I y x I x I x I y I x y ( ; ) ( ) ( | )I x y I y I y x
TEIL
@ N
TU
A Straight Forward Way – cont.
• The uncertainty of event y is never increased by knowledge of x
23
( ) ( ) ( , ) ( ) ( | )I x I y I x y I x I y x
( ) ( | )I y I y x
TEIL
@ N
TU
From instance to expectation• I(x;y)
• I(x)
• I(x|y)
• I(x,y)
• I(x;y)=I(x)-I(x|y)
• I(x,y)=I(x)+I(y)-I(x;y) 24
• I(X;Y)
• H(X)
• H(X|Y)
• H(X,Y)
• I(X;Y)=H(X)-H(X|Y)
• H(X,Y)=H(X)+H(Y)-I(X;Y)
Average
TEIL
@ N
TU
Relationship
25
H(X,Y)
H(X)
H(Y)
H(X|Y) I(X;Y) H(Y|X)
TEIL
@ N
TU
Entropy• The entropy of an ensemble is defined to be
the average value of the self-information of all event x
26
1
1( ) ( ) log( )
n
i
H X p xp x
Average priori uncertainty of an ensemble
TEIL
@ N
TU
Interesting Properties of H(X)• H = 0 if and only if all the but one are zero,
this one having the value unity. Thus only when we are certain of the outcome does H vanish. Otherwise H is positive.
• For a given n, H is a maximum and equal to log(n) when all the are equal, i.e., . This is also intuitively the most uncertain situation.
• Any change toward equalization of the probabilities
, …, increases H. 27
TEIL
@ N
TU
A mathematical way• Can we find a measure of how uncertain we
are of an ensemble ?
• If there is such a measure, say, it is reasonable to require of it the following properties:1. H should be continuous in the 2. If all the are equal, =1/n, then H should be a
monotonic increasing function of n. 3. If a choice be broken down into two successive
choices, the original H should be the weighted sum of the individual values of H. 28
TEIL
@ N
TU
A mathematical way – cont.3. If a choice be broken down into two successive
choices, the original H should be the weighted sum of the individual values of H.
29
1 1 1 1 1 1 2 1 1( , , ) ( , ) ( , ) (1)2 3 6 2 2 2 3 3 2
H H H H
Second choice occurs half the
time
TEIL
@ N
TU
A mathematical way – cont.• Theorem: The only H satisfying the three above
properties is of the form:
30
1
1logn
ii i
H K pp
TEIL
@ N
TU
A mathematical way – cont.• Proof: Let From property(3) we can decompose a choice from equally likely possibilities into a series of m choices from s equally likely possibilities and obtain
31
1 1 1( , ,..., ) ( )H A nn n n
mA(s ) ( )mA s
𝑠𝑚 𝑠𝑠
𝑠m
A(s)
TEIL
@ N
TU
A mathematical way – cont.• Similarly• We can choose n arbitrarily large and find an m to
satisfy
32
nA(t ) ( )nA t
m n m+1s st
log log ( 1) loglog 1log
log , is arbitrarily small (1)log
m s n t m sm t mn s n
m tn s
TEIL
@ N
TU
A mathematical way – cont.• from the monotonic property of A(n)
33
m n m+1 A(s ) ( ) (s )(s) ( ) ( 1) (s)
( ) 1( )
( ) , is arbitrarily small (2)( )
A t AmA nA t m Am A t mn A s n
m A tn A s
TEIL
@ N
TU
A mathematical way – cont.• From equation (1) and (2)
• We get A(t) = K log(t) , K must be positive to satisfy property (2)
34
( ) log( ) 2 , is arbitrarily small( ) log( )A t tA s s
TEIL
@ N
TU
A mathematical way – cont.• Now suppose we have a choice from n possibilities
with commeasurable probabilities where all are integers.
• We can break down a choice from possibilities into a choice from n possibilities with probabilities and then, if the was chosen, a choice from with equal probabilities.
35𝑛
𝑛1
𝑛𝑖
∑𝑛𝑛𝑖 𝑛2
TEIL
@ N
TU
A mathematical way – cont.• Using property (3) again, we equate the total
choice from as computed by two methods
36
𝑛𝑛1
𝑛𝑛
∑𝑛𝑛𝑖 𝑛𝑖
1log ( ,..., ) ( log )i n i in n
K n H p p K p n
TEIL
@ N
TU
A mathematical way – cont.• Hence
• If the pi are not commeasurable, they may be approximated by rational and the same expression must hold by our continuity assumption (property(1) ).
• The choice of coefficient K is a matter of convenience and amounts to the choice of a unit of measure.
37
1( ,..., )
1lo
[ log log ]
lo gg in
n i i i in n n
i
ni
i
n i
H p p K p n p n
K pp
nK pn
TEIL
@ N
TU
Conclusion• We first use a intuitive method to measure
information content of an event or an ensemble• We explain why we choose logarithm
intuitively • Mutual information, entropy is introduced• We show the relationship between
information content and uncertainty• At last, we set three assumptions and derive
the only way to measure information content and show that logarithm must be adopted. 38
TEIL
@ N
TU
Thanks39