final exam: may 10 thursday. if event e occurs, then the probability that event h will occur is p (...

Final Exam Review

Final Exam: May 10 Thursday

If event E occurs, then the probability thatevent H will occur is p(H|E)

IF E (evidence) is trueTHEN H (hypothesis) is true with

probability p

Bayesian reasoning

HpHEpHpHEp

HpHEpEHp

Bayesian reasoning Example: Cancer and Test P(C) = 0.01 P(¬C) = 0.99P(+|C) = 0.9 P(-|C) = 0.1P(+|¬C) = 0.2 P(-|¬C) = 0.8

P(C|+) = ?

HpHEpHpHEp

HpHEpEHp

Expand the Bayesian rule to work with multiple hypotheses (H1...Hm) and evidences (E1...En)

Assuming conditional independence among evidences E1...En

Bayesian reasoning with multiple hypotheses and evidences

m

kkknkk

iiniini

HpHEp...HEpHEp

HpHEpHEpHEpE...EEHp

121

2121

...

Expert data:

Bayesian reasoning Example

H ypothesi sProbability

1i 2i 3i

0.40

0.9

0.6

0.3

0.35

0.0

0.7

0.8

0.25

0.7

0.9

0.5

iHp

iHEp 1

iHEp 2

iHEp 3

user observes E3 E1 E2


1i 2i 3i

0.40

0.9

0.6

0.3

0.35

0.0

0.7

0.8

0.25

0.7

0.9

0.5

iHp

iHEp 1

iHEp 2

iHEp 3

32,1,=,3

1321

321321 i

HpHEpHEpHEp

HpHEpHEpHEpEEEHp

kkkkk

iiiii

0.4525.09.00.50.7

0.7

0.7

0.5

+.3507.00.00.8+0.400.60.90.30.400.60.90.3

3211

EEEHp

025.09.0+.3507.00.00.8+0.400.60.90.3

35.07.00.00.83212

EEEHp

0.5525.09.00.5+.3507.00.00.8+0.400.60.90.3

25.09.00.70.53213

EEEHp

32,1,=,3

1321

321321 i

HpHEpHEpHEp

HpHEpHEpHEpEEEHp

kkkkk

iiiii

0.4525.09.00.50.7

0.7

0.7

0.5

+.3507.00.00.8+0.400.60.90.30.400.60.90.3

3211

EEEHp

025.09.0+.3507.00.00.8+0.400.60.90.3

35.07.00.00.83212

EEEHp

0.5525.09.00.5+.3507.00.00.8+0.400.60.90.3

25.09.00.70.53213

EEEHp

Bayesian reasoning Example

expert system computesposterior probabilitiesuser observes E2


1i 2i 3i

0.40

0.9

0.6

0.3

0.35

0.0

0.7

0.8

0.25

0.7

0.9

0.5

iHp

iHEp 1

iHEp 2

iHEp 3

m

kkknkk

iiniini

HpHEp...HEpHEp

HpHEpHEpHEpE...EEHp

121

2121

...

Propagation of CFsFor a single antecedent rule:

cf(E) is the certainty factor of the evidence.cf(R) is the certainty factor of the rule.

Single antecedent rule exampleIF patient has toothache THEN problem is cavity {cf 0.3}Patient has toothache {cf 0.9}What is the cf(cavity, toothache)?

Propagation of CFs (multiple antecedents)For conjunctive rules:

IF <evidence E1> AND <evidence E2> ... AND <evidence En> THEN <Hypothesis H> {cf}

For two evidences E1 and E2:cf(E1 AND E2) = min(cf(E1), cf(E2))

Propagation of CFs (multiple antecedents)For disjunctive rules:

IF <evidence E1> OR <evidence E2> ... OR <evidence En> THEN <Hypothesis H> {cf}

For two evidences E1 and E2:cf(E1 OR E2) = max(cf(E1), cf(E2))

ExerciseIF (P1 AND P2) OR P3 THEN C1 (0.7) AND C2 (0.3)Assume cf(P1) = 0.6, cf(P2) = 0.4, cf(P3) =

0.2What is cf(C1), cf(C2)?

Defining fuzzy sets with fit-vectorsA can be defined as:

So, for example:Tall men = (0/180, 1/190)Short men=(1/160, 0/170)Average men=(0/165,1/175,0/185)

What about linguistic values with qualifiers?e.g. very tall, extremely short, etc.

Hedges are qualifying terms that modifythe shape of fuzzy setse.g. very, somewhat, quite, slightly,

extremely, etc.

Qualifiers & Hedges

Representing HedgesHedge Mathematical

Expression

A little

Slightly

Very

Extremely

Graphical Representation

[A(x)]1.3

[A(x)]1.7

[A(x)]2

[A(x)]3

Hedge MathematicalExpression

A little

Slightly

Very

Extremely

Graphical Representation

[A(x)]1.3

[A(x)]1.7

[A(x)]2

[A(x)]3

Representing Hedges

Hedge MathematicalExpression Graphical Representation

Very very

More or less

Indeed

Somewhat

2 [A(x )]2

A(x)

A(x)

if 0 A 0.5

if 0.5 < A 1

1 2 [1 A(x)]2

[A(x)]4

Representing HedgesHedge Mathematical

Expression Graphical Representation

Very very

More or less

Indeed

Somewhat

2 [A(x )]2

A(x)

A(x)

if 0 A 0.5

if 0.5 < A 1

1 2 [1 A(x)]2

[A(x)]4

Hedge MathematicalExpression Graphical Representation

Very very

More or less

Indeed

Somewhat

2 [A(x )]2

A(x)

A(x)

if 0 A 0.5

if 0.5 < A 1

1 2 [1 A(x)]2

[A(x)]4

Crisp Set Operations

Intersection Union

Complement

NotA

A

Containment

AA

B

BA AA B

Intersection Union

Complement

NotA

A

Containment

AA

B

BA AA B

ComplementTo what degree do elements not belong to this set?

tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190};

Not tall men = {1/180, 0.75/182, 0.5/185, 0.25/187, 1/190};

Fuzzy Set Operations

Intersection Union

Complement

NotA

A

Containment

AA

B

BA AA B

Complement

0x

1

(x)

0x

1

Containment

0x

1

0x

1

AB

Not A

A

Intersection

0x

1

0x

AB

Union0

1

ABAB

0x

1

0x

1

B

A

B

A

(x)

(x) (x)

m¬A(x) = 1 – mA(x)

ContainmentWhich sets belong to other sets?

tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190};

very tall men = {0/180, 0.06/182, 0.25/185, 0.56/187, 1/190};


Intersection Union

Complement

NotA

A

Containment

AA

B

BA AA B

Complement

0x

1

(x)

0x

1

Containment

0x

1

0x

1

AB

Not A

A

Intersection

0x

1

0x

AB

Union0

1

ABAB

0x

1

0x

1

B

A

B

A

(x)

(x) (x)

Each element of the fuzzysubset has smaller membership

than in the containing set

IntersectionTo what degree is the element in both sets?


Intersection Union

Complement

NotA

A

Containment

AA

B

BA AA B

Complement

0x

1

(x)

0x

1

Containment

0x

1

0x

1

AB

Not A

A

Intersection

0x

1

0x

AB

Union0

1

ABAB

0x

1

0x

1

B

A

B

A

(x)

(x) (x)

mA∩B(x) = min[ mA(x), mB(x) ]

tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190};

average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190};

tall men ∩ average men = {0/165, 0/175, 0/180, 0.25/182, 0/185, 0/190};

ortall men ∩ average men = {0/180, 0.25/182,

0/185};

mA∩B(x) = min[ mA(x), mB(x) ]

UnionTo what degree is the element in either or

both sets?


Intersection Union

Complement

NotA

A

Containment

AA

B

BA AA B

Complement

0x

1

(x)

0x

1

Containment

0x

1

0x

1

AB

Not A

A

Intersection

0x

1

0x

AB

Union0

1

ABAB

0x

1

0x

1

B

A

B

A

(x)

(x) (x)

mAB(x) = max[ mA(x), mB(x) ]

tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190};

average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190};

tall men average men = {0/165, 1/175, 0.5/180, 0.25/182, 0.5/185, 1/190};

mAB(x) = max[ mA(x), mB(x) ]

25

Choosing the Best Attribute:Binary ClassificationWant a formal measure that returns a maximum

value when attribute makes a perfect split and minimum when it makes no distinction

Information theory (Shannon and Weaver 49)Entropy: a measure of uncertainty of a random

variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit)

Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute

26

Formula for Entropy

Examples:Suppose we have a collection of 10 examples, 5

positive, 5 negative:H(1/2,1/2) = -1/2log21/2 -1/2log21/2 = 1 bit

Suppose we have a collection of 100 examples, 1 positive and 99 negative:

H(1/100,99/100) = -.01log2.01 -.99log2.99 = .08 bits

Information gainInformation gain (from attribute test) =

difference between the original information requirement and new requirement

Information Gain (IG) or reduction in entropy from the attribute test:

Choose the attribute with the largest IG

Information gainFor the training set, p = n = 6, I(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type (and others too):

Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

Example contd.Decision tree learned from the 12 examples:

Substantially simpler than “true”

Perceptrons

Threshold

Inputs

x1

x2

Output

Y

HardLimiter

w2

w1

LinearCombiner

X = x1w1 + x2w2

Y = Ystep

Perceptrons

How does a perceptron learn?A perceptron has initial (often random)

weights typically in the range [-0.5, 0.5]Apply an established training dataset Calculate the error as

expected output minus actual output:

error e = Yexpected – Yactual

Adjust the weights to reduce the error

Perceptrons

How do we adjust a perceptron’s weights to produce Yexpected?If e is positive, we need to increase Yactual

(and vice versa)Use this formula:

, where and

α is the learning rate (between 0 and 1) e is the calculated error

Perceptron Example – AND

Train a perceptron to recognize logical AND

Use threshold Θ = 0.2 andlearning rate α = 0.1

Perceptron Example – ANDRepeat until convergence

i.e. final weights do not change and no error

Use threshold Θ = 0.2 andlearning rate α = 0.1

final exam: may 10 thursday. if event e occurs, then the probability that event h will occur is p (...

Documents

attribute slide

cfe2 slide

b x slide

tall men average men

e evidence

event e

short men

e n bayesian reasoning