30-08-05prof. pushpak bhattacharyya, iit bombay1 cs 621 artificial intelligence lecture 12 –...

31
30-08-05 Prof. Pushpak Bhattacharyya, IIT B ombay 1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information Theory

Upload: lucy-atkinson

Post on 20-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Prof. Pushpak Bhattacharyya, IIT Bombay3 Weather (0) Temp (T) Humidity (H) Windy (W) Decision (D) SunnyMedHighFN SunnyColdLowFY RainMedLowFY SunnyMedLowTY CloudyMedHighTY CloudyHighLowFY RainHigh TN

TRANSCRIPT

Page 1: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 1

CS 621 Artificial Intelligence

Lecture 12 – 30/08/05

Prof. Pushpak Bhattacharyya

Fundamentals of Information Theory

Page 2: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 2

Weather(0)

Temp(T)

Humidity(H)

Windy(W)

Decision(D)

Sunny High High F NSunny High High T NCloudy High High F YRain Med High F YRain Cold Low N YRain Cold Low T N

Cloudy Cold Low T Y

Page 3: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 3

Weather(0)

Temp(T)

Humidity(H)

Windy(W)

Decision(D)

Sunny Med High F N

Sunny Cold Low F Y

Rain Med Low F Y

Sunny Med Low T Y

Cloudy Med High T Y

Cloudy High Low F Y

Rain High High T N

Page 4: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 4

Outlook

RainCloudySunny

WindyYesHumidity

High Low

YesNo

T F

YesNo

Page 5: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 5

Rule Base

R1: If outlook is sunny and if humidity is high then Decision is No.

R2: If outlook is sunny and if humidity is low then Decision is Yes.

R3: If outlook is cloudy then Decision is Yes.

Page 6: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 6

Making Sense of Information

• Classification

• Clustering

• Giving a short and nice description

Page 7: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 7

Short Description

Occam Razor principle

(Shortest/simplest description is the best for generalization)

Page 8: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 8

Representation Language

• Decision tree.

• Neural network.

• Rule base.

• Boolean expression.

Page 9: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 9

The example data presented in the form of rows and labels has low ordered/structured information compared to the succinct description (Decision Trees and Rule Base).

Define “information”Lack of structure in information by “Entropy”

Information & Entropy

Page 10: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 10

Define Entropy of S (Labeled data)

E(S) = - ( P+ log2P+ + P- log2P- )

P+ = proportion of positively labeled data.

P- = proportion of negatively labeled data.

Page 11: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 11

Example

P+ = 9/14

P1 = 5/14

E(S) = - 9/14 log2 (9/14) – 5/14 log2 (5/14)

= 0.91

Page 12: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 12

Partitioning the Data

“Windy” as the attribute

Windy = [ T, F]

Windy = T : Partitioning the data

Page 13: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 13

Partitioning by focusing on a particular attribute produced

“Information gain”

“Reduction in Entropy”

Partitioning the Data (Contd)

Page 14: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 14

Information gain when we choose windy = [ T, F ]

Windy = T, P+ = 6 , P- = 2

Windy = F, P+ = 3 , P- = 3

Partitioning the Data (Contd)

Page 15: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 15

Windy

T F

6, +2, -

3, +3, -

Partitioning the Data (Contd)

Page 16: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 16

Gain(S,A) =

= E(S) -∑( |Sv| / |S| )E(Sv) v є values of A

Partitioning the Data (Contd)

E(S) = 0.914

E(S, Windy):E( Windy=T)

= - 6/8 log 6/8 – 2/8 log 2/8= v

Page 17: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 17

E( Windy=F)

= - 3/6 log 3/6 – 3/6 log 3/6

= 1.0

Partitioning the Data (Contd)

Page 18: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 18

Gain(S,Windy) = = 0.0914 – (8/14 * v + 9/19* 1.0)= N

Exercise: Find information gain for each attribute: outlook, Temp, Humidity and windy.

Partitioning the Data (Contd)

Page 19: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 19

ID3 Algorithm

Calculating the gain for every attribute and choosing the one with maximum gain to finally

arrive at the decision tree is called “ID3” algorithm to build a classifier.

Page 20: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 20

Origin of Information Theory

1) Shannon “The mathematical Theory of communication”, Bell systems Journal, 1948.

2) Cover and Thomas, “Elements of Information Theory”, 1991.

Page 21: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 21

Motivation with the example of a horse race

8 horses - h1,h2……h8

Person P would like to bet on one of the horse. The horse have probability of winning as follows:

Example

Page 22: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 22

• h1 = 1/5 h5 = 1/64

• h2 = 1/4 h6 = 1/64

• h3 = 1/8 h7 = 1/64

• h4 = 1/16 h8 = 1/64

∑hi = 1.

Example (Contd 1)

Page 23: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 23

Send message specifying the horse on which to bet.

If the situation is “unbiased” i.e., all horses have equal probability of winning then we need 3 binary units.

3 = log 28

Example (Contd 2)

Page 24: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 24

Compute the bias

E(s) = - ∑ Pi Log Pi

i = 1,… 8

Pi = Probability of hi winning

E(s) = - ( ½ log ½ + ¼ log ¼ + ……

…….. . + l/64 log 1/64) = 2

Example (Contd 3)

Page 25: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 25

On the average we do not need more than 2 bits to communicate the desired horse.

Actual length of code ?

Example (Contd 4)

Design Of Optimal Code Is A Separate Problem.

Page 26: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 26

Example 2 ( Letter Guessing Game)

P t K a i u

1/8 1/4 1/8 1/8 1/4 1/8

20 – Question game

E(s) = - ∑ Pi Log2 Pi

i = {p, t, k, a, i , u }

= 2.5

Page 27: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 27

On the average we need no more than 2.5 questions.

Design a code:

P t K a i u

1/8 1/4 1/8 1/8 1/4 1/8

100 00 101 110 01 111

Page 28: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 28

Q1) Is the letter t or I ?

Q2) Is it a constant ?

Expected number of questions = ∑ Pi * Ni

Where Ni = # questions for Situation i.

Page 29: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 29

What has all this got to do with AI ?

Why entropy?

Why design codes?

Why communicate ?

Page 30: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 30

Bridge

Multiparty participation is intelligent transformation processing.

Information gain sets up theoretical limits in communicability.

Page 31: 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05…

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 31

Summary• Haphazard presentation of data is not

acceptable to MIND.• Focusing attention on an attribute, automatically

leads to information gain.• Designed entropy.• Parallely designed information gain .• Related this to message communication.