cpsc 422, lecture 18slide 1 intelligent systems (ai-2) computer science cpsc422, lecture 18 feb, 25,...

40
CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models

Upload: esther-brooks

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 18 Slide 1

Intelligent Systems (AI-2)

Computer Science cpsc422, Lecture 18

Feb, 25, 2015Slide SourcesRaymond J. Mooney University of Texas at Austin

D. Koller, Stanford CS - Probabilistic Graphical Models

Page 2: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 2

Lecture Overview

Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and

Approx.)

• Conditional Random Fields

Page 3: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Parameterization of Markov Networks

CPSC 422, Lecture 17 Slide 3

Factors define the local interactions (like CPTs in Bnets)

What about the global model? What do you do with Bnets?

X

X

Page 4: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

How do we combine local models?

As in BNets by multiplying them!

CPSC 422, Lecture 17 Slide 4

Page 5: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Step Back…. From structure to factors/potentials

In a Bnet the joint is factorized….

CPSC 422, Lecture 17 Slide 5

In a Markov Network you have one factor for each maximal clique

Page 6: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 6

General definitionsTwo nodes in a Markov network are

independent if and only if every path between them is cut off by evidence

So the markov blanket of a node is…?

eg for C

eg for A C

Page 7: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 7

Lecture Overview

Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and

Approx.)

• Conditional Random Fields

Page 8: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

8

Markov Networks Applications (1): Computer Vision

Called Markov Random Fields• Stereo Reconstruction• Image Segmentation• Object recognition

CPSC 422, Lecture 17

Typically pairwise MRF• Each vars correspond to a pixel (or

superpixel )• Edges (factors) correspond to interactions

between adjacent pixels in the image• E.g., in segmentation: from generically

penalize discontinuities, to road under car

Page 9: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 9

Image segmentation

Page 10: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

10

Markov Networks Applications (2): Sequence Labeling in NLP and

BioInformaticsConditional random fields

CPSC 422, Lecture 17

Page 11: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 11

Lecture Overview

Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and

Approx.)

• Conditional Random Fields

Page 12: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 Slide 12

Variable elimination algorithm for Bnets

To compute P(Z| Y1=v1 ,… ,Yj=vj ) :1. Construct a factor for each conditional probability.

2. Set the observed variables to their observed values.

3. Given an elimination ordering, simplify/decompose sum of products

4. Perform products and sum out Zi

5. Multiply the remaining factors Z

6. Normalize: divide the resulting factor f(Z) by Z f(Z) .Variable elimination algorithm

for Markov Networks…..

Page 13: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

13

Gibbs sampling for Markov Networks

Example: P(D | C=0)Resample non-evidence

variables in a pre-defined order or a random order

Suppose we begin with A

A

B C

D E

F

A B C D E F

1 0 0 1 1 0

Note: never change evidence!

What do we need to sample?

A. P(A | B=0)

B. P(A | B=0, C=0)

C. P( B=0, C=0| A)

CPSC 422, Lecture 17

Page 14: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 14

Example: Gibbs sampling

Resample probability distribution of P(A|BC)

A

B C

D E

F

ϕ1

ϕ2 ϕ3

A=1 A=0

C=1 1 2

C=0

3 4

A=1 A=0

B=1 1 5

B=0 4.3 0.2A B C D E F

1 0 0 1 1 0

? 0 0 1 1 0

Φ1 × Φ2 × Φ3 = A=1 A=0

12.9 0.8

Normalized result = A=1 A=0

0.95 0.05

Page 15: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 15

Example: Gibbs sampling

Resample probability distribution of B given A D

A

B C

D E

F

ϕ1

ϕ2

ϕ4

D=1 D=0

B=1 1 2

B=0

2 1

A=1 A=0

B=1 1 5

B=0

4.3 0.2A B C D E F

1 0 0 1 1 0

1 0 0 1 1 0

1 ? 0 1 1 0

Φ1 × Φ2 × Φ4 = B=1 B=0

1 8.6

Normalized result = B=1 B=0

0.11 0.89

Page 16: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 17 16

Lecture Overview

Probabilistic Graphical models• Recap Markov Networks• Applications of Markov Networks• Inference in Markov Networks (Exact and

Approx.)

• Conditional Random Fields

Page 17: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

We want to model P(Y1| X1.. Xn)

• Which model is simpler, MN or BN?

CPSC 422, Lecture 18 Slide 17

Y1

X1 X2 … Xn

Y1

X1 X2 … Xn

• Naturally aggregates the influence of different parents

… where all the Xi are always observed MN BN

Page 18: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Conditional Random Fields (CRFs)

• Model P(Y1 .. Yk | X1.. Xn)

• Special case of Markov Networks where all the Xi are always observed

• Simple case P(Y1| X1…Xn)

CPSC 422, Lecture 18 Slide 18

Page 19: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

What are the Parameters?

CPSC 422, Lecture 18 Slide 19

Page 20: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 20

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

Page 21: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 21

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

Page 22: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 22

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

0

Page 23: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 23

n

iiin xwwxxYP

1011 )exp(),....,,1( Y1

X1 X2 … Xn1),....,,0( 11

nxxYP

),....,|1( 11 nxxYP

Page 24: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 24

n

iiin xwwxxYP

1011 )exp(),....,,1( Y1

X1 X2 … Xn1),....,,0( 11

nxxYP

),....,|1( 11 nxxYP

Page 25: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Sigmoid Function used in Logistic Regression

• Great practical interest

• Number of param wi is linear instead of exponential in the number of parents

• Natural model for many real-world applications

• Naturally aggregates the influence of different parents

CPSC 422, Lecture 18 Slide 25

Y1

X1 X2 … Xn

Y1

X1 X2 … Xn

Page 26: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Logistic Regression as a Markov Net (CRF)

Logistic regression is a simple Markov Net (a CRF) aka naïve markov model

Y

X1 X2… Xn

• But only models the conditional distribution, P(Y | X ) and not the full joint P(X,Y )

CPSC 422, Lecture 18 Slide 26

Page 27: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Naïve Bayes vs. Logistic RegressionY

X1 X2… Xn

Y

X1 X2

…Xn

Naïve Bayes

LogisticRegression (Naïve Markov)

Conditional

Generative

Discriminative

CPSC 422, Lecture 18 Slide 27

Page 28: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 18 Slide 28

Learning Goals for today’s class

You can:

• Perform Exact and Approx. Inference in Markov Networks

• Describe a few applications of Markov Networks

• Describe a natural parameterization for a Naïve Markov model (which is a simple CRF)

• Derive how P(Y|X) can be computed for a Naïve Markov model

• Explain the discriminative vs. generative distinction and its implications

Page 29: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 18 Slide 29

Next class

Revise generative temporal models (HMM)

To Do

Linear-chain CRFs

Midterm will be marked by Fri

Page 30: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

30

Generative vs. Discriminative Models

Generative models (like Naïve Bayes): not directly designed to maximize performance on classification. They model the joint distribution P(X,Y).

Classification is then done using Bayesian inference

But a generative model can also be used to perform any other inference task, e.g. P(X1 | X2, …Xn, )• “Jack of all trades, master of none.”

Discriminative models (like CRFs): specifically designed and trained to maximize performance of classification. They only model the conditional distribution P(Y | X ).

By focusing on modeling the conditional distribution, they generally perform better on classification than generative models when given a reasonable amount of training data.

CPSC 422, Lecture 18

Page 31: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

On Fri: Sequence Labeling

Y2

X1 X2… XT

HMM

Linear-chain CRF

Conditional

Generative

Discriminative

Y1 YT

..

Y2

X1 X2… XT

Y1 YT

..

CPSC 422, Lecture 18 Slide 31

Page 32: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 18 32

Lecture Overview

• Indicator function• P(X,Y) vs. P(X|Y) and Naïve Bayes • Model P(Y|X) explicitly with Markov Networks

• Parameterization• Inference

• Generative vs. Discriminative models

Page 33: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

P(X,Y) vs. P(Y|X)

Assume that you always observe a set of variables

X = {X1…Xn} and you want to predict one or more variables Y = {Y1…Ym}

You can model P(X,Y) and then infer P(Y|X)

Slide 33CPSC 422, Lecture 18

Page 34: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

P(X,Y) vs. P(Y|X)

With a Bnet we can represent a joint as the product of Conditional Probabilities

With a Markov Network we can represent a joint a the product of Factors

Slide 34CPSC 422, Lecture 18

We will see that Markov Network are also suitable for representing the conditional prob. P(Y|X) directly

Page 35: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Directed vs. Undirected

CPSC 422, Lecture 18 Slide 35

Factorization

Page 36: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Naïve Bayesian Classifier P(Y,X)A very simple and successful Bnets that allow to

classify entities in a set of classes Y1, given a set of features (X1…Xn)

Example: • Determine whether an email is spam (only two

classes spam=T and spam=F)• Useful attributes of an email ?

Assumptions• The value of each attribute depends on the

classification• (Naïve) The attributes are independent of each

other given the classification P(“bank” | “account” , spam=T) = P(“bank” | spam=T)

Slide 36CPSC 422, Lecture 18

Page 37: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Naïve Bayesian Classifier for Email Spam

Email Spam

Email contains “free”

words

• What is the structure?

Email contains “money”

Email contains “ubc”

Email contains “midterm”

The corresponding Bnet represent : P(Y1, X1…Xn)

Slide 37CPSC 422, Lecture 18

Page 38: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Can we derive : P(Y1| X1…Xn) for any x1…xn

“free money for you now”

NB Classifier for Email Spam: Usage

Email Spam

Email contains “free”

Email contains “money”

Email contains “ubc”

Email contains “midterm”

But you can also perform any other inference…e.g., P(X1| X3 )

Slide 38CPSC 422, Lecture 18

Page 39: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

Can we derive : P(Y1| X1…Xn)

“free money for you now”

NB Classifier for Email Spam: Usage

Email Spam

Email contains “free”

Email contains “money”

Email contains “ubc”

Email contains “midterm”

But you can perform also any other inferencee.g., P(X1| X3 )

Slide 39CPSC 422, Lecture 18

Page 40: CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of

CPSC 422, Lecture 18 Slide 40