15-381: artificial intelligence€¦ · 15-381: artificial intelligence probabilistic reasoning and...

Post on 18-Apr-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

15-381: Artificial Intelligence

Probabilistic Reasoning and Inference

Advantages of probabilisticreasoning

• Appropriate for complex, uncertain, environments - Will it rain tomorrow?

• Applies naturally to many domains - Robot predicting the direction of road, biology, Word paper clip

• Allows to generalize acquired knowledge andincorporate prior belief

- Medical diagnosis

• Easy to integrate different information sources - Robot’s sensors

Examples• Unmanned vehicles

Examples: Speech processing

Example: Biological data

ATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTCAAGTGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTGGGAGTGTCGCTACTCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACATCTGACAGAAGTGGAATCAAGGCTAGAAAGACTGGAACAGCTATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATT

Basic notations• Random variable - referring to an element / event whose status is unknown: A = “it will rain tomorrow”• Domain (usually denoted by Ω) - The set of values a random variable can take: - “A = The stock market will go up this year”: Binary - “A = Number of Steelers wins in 2007”: Discrete - “A = % change in Google stock in 2007”: Continuous

Axioms of probability(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just threeaxioms:

1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)

Axioms of probability(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just threeaxioms:

1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)

P(Steelers win the 05-06 season) = 1

Axioms of probability(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just threeaxioms:

1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)

Axioms of probability(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just threeaxioms:

1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)

Axioms of probability(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just threeaxioms:

1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)

There have been severalother attempts to provide afoundation for probabilitytheory. Kolmogorov’saxioms are the most widelyused.

Example of using the axioms• Assume a probability for dice as

follows:P(1) = P(2) = P(3) = 0.2P(4) = P(5) = 0.1P(6) = ?• P({heads,tails}) = ?• P(heads)+(1-P(tails)) = ?

Using the axioms• How can we use the axioms to prove that: P(¬A) = 1 – P(A) ?

Priors

P(rain tomorrow) = 0.2

P(no rain tomorrow) = 0.8

Rain

No rainDegree of beliefin an event in theabsence of anyother information

Conditional probablity• What is the probability of an event given knowledge of

another event• Example: - P(raining | sunny) - P(raining | cloudy) - P(raining | cloudy, cold)

Conditional probability• P(A = 1 | B = 1): The fraction of cases where A is true if

B is true

P(A = 0.2) P(A|B = 0.5)

Conditional probability• In some cases, given knowledge of one or more

random variables we can improve upon our priorbelief of another random variable

• For example: p(slept in movie) = 0.5 p(slept in movie | liked movie) = 1/3 p(didn’t sleep in movie | liked movie) = 2/3

0.3100.1000.4010.211

PSleptLikedmovie

Joint distributions• The probability that a set of random

variables will take a specific value is theirjoint distribution.

• Notation: P(A ∧ B) or P(A,B)• Example: P(liked movie, slept)

0.310

0.100

0.401

0.211

PSleptLikedmovie

Joint distribution (cont)

1432

3131

2512

3152

1652

2121

3342

2101

Evaluation(1-3)

Class sizeTime (regular =2,summer =1)

P(class size > 20) = 0.5

P(summer) = 1/3

Evaluation of classesP(class size > 20, summer) = ?

Joint distribution (cont)

1432

3131

2512

3152

1652

2121

3342

2101

Evaluation(1-3)

Class sizeTime (regular =2,summer =1)

P(class size > 20) = 0.5

P(summer) = 1/3

Evaluation of classesP(class size > 20, summer) = 0

Joint distribution (cont)

1432

3131

2512

3152

1652

2121

3342

2101

Evaluation(1-3)

Class sizeTime (regular =2,summer =1)

P(class size > 20) = 0.5

P(eval = 1) = 2/9

P(class size > 20, eval = 1) = 2/9 Evaluation of classes

Chain rule• The joint distribution can be specified in terms of

conditional probability: P(A,B) = P(A|B)*P(B)• Together with Bayes rule (which is actually derived from

it) this is one of the most powerful rules in probabilisticreasoning

Bayes rule• One of the most important rules for AI usage.• Derived from the chain rule: P(A,B) = P(A | B)P(B) = P(B | A)P(A)• Thus,

Thomas Bayes wasan Englishclergyman who setout his theory ofprobability in 1764.

)(

)()|()|(

BP

APABPBAP =

Bayes rule (cont)Often it would be useful to derive the rule a bit

further:

!==

A

APABP

APABP

BP

APABPBAP

)()|(

)()|(

)(

)()|()|(

This results from:P(B) = ∑AP(B,A)

AB

AB

P(B,A=1) P(B,A=0)

Using Bayes rule• Cards game:

Place your bet on thelocation of the King!

Using Bayes rule• Cards game:

Do you want tochange your bet?

Using Bayes rule

)(

)()|()|(

selBP

kCPkCselBPselBkCP

====

Computing the (posterior) probability: P(C = k | selB)

A B C

)10()10|()()|(

)()|(

==+==

===

CPCselBPkCPkCselBP

kCPkCselBP

Bayes rule

Using Bayes rule

A B C

1/31/2

1/2 1/3 2/31/2

)10()10|()()|(

)()|(

==+==

===

CPCselBPkCPkCselBP

kCPkCselBP

= 1/3

P(C=k | selB) =

Important points• Random variables• Chain rule• Bayes rule• Joint distribution, independence, conditional

independence

Joint distributions• The probability that a set of random

variables will take a specific value is theirjoint distribution.

• Requires a joint probability table tospecify the possible assignments

• The table can grow very rapidly …

0.310

0.100

0.401

0.211

PSleptLikedmovie

How can we decrease the number of columns inthe table?

top related