data15001 introduction to artificial intelligence · 2019-09-19 · introduction to artificial...

35
INTRODUCTION TO ARTIFICIAL INTELLIGENCE DATA15001 EPISODE 6: BAYESIAN NETWORKS

Upload: others

Post on 20-May-2020

36 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

I N T R O D U C T I O N T O A R T I F I C I A L I N T E L L I G E N C E

D ATA 1 5 0 0 1

E P I S O D E 6 : B AY E S I A N N E T W O R K S

Page 2: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

1. B AY E S I A N N E T W O R K S

2. C A R E X A M P L E

3. I N F E R E N C E ( E X A C T A N D A P P R O X I M AT E )

T O D AY ’ S M E N U

Page 3: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

B AY E S I A N N E T W O R K S

• A Bayesian network is a representation of a probabilistic model

• The nodes of the network (X, Y, Z, Å) are random variables (r.v.) such as the result of a die, or a medical condition, ...

• The edges correspond to direct dependency: no edge ⇔ conditional independence (exact definition will be studied in DATA12002 Probabilistic Graphical Models)

• Each r.v. is given a conditional distribution of the form P(V = v | PaV = pav), where PaV are the parents of node V

Z

X

Y

Å

Page 4: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

B AY E S I A N N E T W O R K S

• No directed cycles allowed

• Joint probabilities are obtained as P(x,y,z,å) = P(x) P(y) P(z | x,y) P(å | x)

• Compare this with the chain rule P(x,y,z,å) = P(x) P(y | x) P(z | x,y) P(å | x,y,z)

Z

X

Y

Å

PARENTSZ

Page 5: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

B AY E S I A N N E T W O R K S

• No directed cycles allowed

• Joint probabilities are obtained as P(x,y,z,å) = P(x) P(y) P(z | x,y) P(å | x)

• Compare this with the chain rule P(x,y,z,å) = P(x) P(y | x) P(z | x,y) P(å | x,y,z)

Z

X

Y

Å

c o n d i t i o n a l i n d e p e n d e n c e !

Page 6: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

B AY E S I A N N E T W O R K S

• The power of BNs: – easier to define conditional distributions, e.g.,

P(å | x) rather than P(å | x,y,z) – efficient inference procedures for computing posterior

probabilities

Z

X

Y

Å

Page 7: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

E X A M P L E : C A R P R O B L E M S ?

Page 8: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

RADIO

BATTERY

IGNITION GAS

STARTS

MOVES

E X A M P L E : C A R P R O B L E M S ?

Page 9: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

B AY E S I A N N E T W O R K S

• If the battery is dead, no radio and no ignition

• If there's no ignition, the car won't start

• If there's no gas, the car won't start

• If the car won't start, it won't move

• Car won't move: where is the problem? P(state | obs)

• Music on the radio? Gas meter? <– obs

Page 10: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

RADIO

BATTERY

IGNITION GAS

STARTS

MOVES

[R.I.P. Chester Bennington (1976–2017)]

E X A M P L E : C A R P R O B L E M S ?

Page 11: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

RADIO

BATTERY

IGNITION GAS

STARTS

MOVES

E X A M P L E : C A R P R O B L E M S ?

q u i t e s u r e ?

Page 12: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

RADIO

BATTERY

IGNITION GAS

STARTS

MOVES

E X A M P L E : C A R P R O B L E M S ?

9 5 % p r o b .9 0 % p r o b . 9 9 % p r o b .

9 9 % p r o b .

9 0 % p r o b .

9 5 % p r o b .

Page 13: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• P(“battery alive”) = 0.9

• P(“radio ok” | “battery alive”) = 0.9P(“radio ok” | ¬”battery alive”) = 0

• p(“ignition” | “battery alive”) = 0.95P(“ignition” | ¬”battery alive”) = 0

• p(“gas”) = 0.95

• p(“starts” | “ignition” AND “gas”) = 0.99p(“starts” | ¬”ignition” OR ¬”gas”) = 0

• p(“moves” | “starts”) = 0.99p(“moves” | ¬”starts”) = 0

E X A M P L E : C A R P R O B L E M S ?

Page 14: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• P(“battery alive” | ¬“starts” AND “radio ok” AND "gas") = ?

• Exact approach: P(B,¬S,R,G) P(B | ¬S,R,G) = ----------- P(¬S,R,G)

E X A M P L E : C A R P R O B L E M S ?

Page 15: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• P(“battery alive” | ¬“starts” AND “radio ok” AND "gas") = ?

• Exact approach: P(B,¬S,R,G) P(B | ¬S,R,G) = ----------- P(¬S,R,G) P(B,¬S,R,G) = P(B,R,I,G,¬S,M) + P(B,R,I,G,¬S,¬M) + P(B,R,¬I,G,¬S,M) + P(B,R,¬I,G,¬S,¬M)

• Again, the probability of an event, (B,¬S,R,G), is a sum of elementary event probabilities

E X A M P L E : C A R P R O B L E M S ?

Page 16: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• The elementary event probabilities are conveniently obtained from the Bayesian network, e.g.,P(B,¬S,R,G) = P(B,R,I,G,¬S,M) + P(B,R,I,G,¬S,¬M) + P(B,R,¬I,G,¬S,M) + P(B,R,¬I,G,¬S,¬M) P(B,R,I,G,¬S,M) = P(B) P(R|B) P(I|B) P(G) P(¬S|I,G) P(M|¬S)

E X A M P L E : C A R P R O B L E M S ?

Page 17: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• The elementary event probabilities are conveniently obtained from the Bayesian network, e.g.,P(B,¬S,R,G) = P(B,R,I,G,¬S,M) + P(B,R,I,G,¬S,¬M) + P(B,R,¬I,G,¬S,M) + P(B,R,¬I,G,¬S,¬M) P(B,R,I,G,¬S,M) = P(B) P(R|B) P(I|B) P(G) P(¬S|I,G) P(M|¬S) = 0.9 · 0.9 · 0.95 · 0.95 · 0.01 · 0.0

• Note that the product has terms of the form P(V | PaV)

• This gives a numerical value for P(B,¬S,R,G)

• A similar sum yields P(¬S,R,G)

E X A M P L E : C A R P R O B L E M S ?

Page 18: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• This direct approach always gives the exact solution

• However, the sums can quickly become very large (no. of terms is exponential in the size of the network)

• More clever inference algorithms exploit the structure of the network

• For example, in tree-shaped networks (any two nodes are connected by at most one path), belief propagation runs in linear time wrt. number of nodes

• These algorithms are not discussed on this course

E X A M P L E : C A R P R O B L E M S ?

Page 19: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• Instead of exact inference algorithms, we take a "hackers approach" to probability

• The probability of any event can be approximated by the Monte Carlo method / sampling: repeat the trial many times and calculate the relative frequency of the event

• E.g., toss a coin 106 times: P(heads) ≈ #heads / #tosses

• To approximate conditional probability P(A | B):

1. generate N tuples (A, B)

2. discard all but those where B occurs

3. among the remaining tuples, calculate the portion where A occurs

A P P R O X I M AT E I N F E R E N C E

Page 20: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

1. generate N cases (tuples) from the car BN

2. choose tuples where car doesn't start, radio ok, gas

3. calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

Page 21: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

Page 22: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [] V = 'B' V.Pa = pa = [] V.CPT(pa) = [0.1, 0.9]

Page 23: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1] V = 'R' V.Pa = 'B' pa = [1] V.CPT(pa) = [0.1, 0.9]

C P T o f ' R a d i o ' : ( 1 . 0 , 0 . 0 ) i f B a t t e r y = 0 ( 0 . 1 , 0 . 9 ) i f B a t t e r y = 1

Page 24: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1] V = 'I' V.Pa = 'B' pa = [1] V.CPT(pa) = [0.05, 0.95]

Page 25: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1,1] V = 'G' V.Pa = pa = [] V.CPT(pa) = [0.05, 0.95]

Page 26: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1,1,1] V = 'S' V.Pa = 'I,G' pa = [1,1] V.CPT(pa) = [0.01, 0.99]

C P T o f ' S t a r t s ' : ( 1 . 0 0 , 0 . 0 0 ) i f I g n i t i o n = 0 , G a s = 0 ( 1 . 0 0 , 0 . 0 0 ) i f I g n i t i o n = 0 , G a s = 1 ( 1 . 0 0 , 0 . 0 0 ) i f I g n i t i o n = 1 , G a s = 0 ( 0 . 0 1 , 0 . 9 9 ) i f I g n i t i o n = 1 , G a s = 1

Page 27: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1,1,1,1] V = 'M' V.Pa = 'S' pa = [1] V.CPT(pa) = [0.01, 0.99]

Page 28: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1,1,1,1,1]

Page 29: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1,1,0,0,0]

Page 30: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• In the car problem, to approximate P(B| ¬S, R, G):

• generate N cases (tuples) from the car BN

• choose tuples where car doesn't start, radio ok, gas

• calculate the portion of these where battery is alive

• As N → ∞, the approximation converges to the exact value

generate_tuples(N, model): for i = 1 to N: v = empty array for V in model.variables: pa = v[V.Pa] # parents of V v.append(sample(V.CPT(pa))) output v

A P P R O X I M AT E I N F E R E N C E

c o n d i t i o n a l p r o b a b i l i t y t a b l e

R

B

I G

S

M

v = [1,1,0,1,0,0]

Page 31: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• Spam filter! (and million other naive Bayes classifiers)

• Dynamic Bayesian networks for ecological modelling

• Medical diagnostics (causal factors –> disease status –> symptoms)

• Player matching: Microsoft TrueSkillTM

(well, factor graphs really, but closely related to Bayesian networks)

B AY E S I A N N E T W O R K A P P L I C AT I O N S

Page 32: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

B AY E S I A N N E T W O R K A P P L I C AT I O N S

Source: R. Herbrich, T. Minka, T. Graepel, "TrueSkillTM: A Bayesian Skill Rating System", NIPS-2006

Page 33: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

• Spam filter! (and million other naive Bayes classifiers)

• Dynamic Bayesian networks for ecological modelling

• Medical diagnostics (causal factors –> disease status –> symptoms)

• Player matching: Microsoft TrueSkillTM

(well factors graphs really, but closely related graphical models)

• Error correcting codes ("Turbo codes", e.g., Mars mission)

• Football score prediction

• ...

B AY E S I A N N E T W O R K A P P L I C AT I O N S

Page 34: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

1. N E T W O R K S T R U C T U R E S

2. C A R E X A M P L E

3. I N F E R E N C E ( E X A C T A N D A P P R O X I M AT E )

S U M M A R YZ

X

Y

Å

[1,1,1,1,1,1][1,1,1,0,0,0] [1,1,0,1,0,0] ⋮

P(B,¬S,R,G) P(B | ¬S,R,G) = ----------- P(¬S,R,G)

Page 35: DATA15001 INTRODUCTION TO ARTIFICIAL INTELLIGENCE · 2019-09-19 · introduction to artificial intelligence data15001 episode 6: bayesian networks. 1. bayesian networks 2. car example

N E X T W E E K : M A C H I N E L E A R N I N G