4190.408 2015-spring bayesian networks 3, 4b io i ntelligence 4190.408 artificial intelligence...

41
Bio Intelligence 4190.408 Artificial Intelligence (2015-Spring) 4190.408 2015-Spring Bayesian Networks – 3, 4 Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence Lab Seoul National University 1

Upload: others

Post on 11-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

4190.408 2015-Spring

Bayesian Networks – 3, 4Inference with Probabilistic Graphical Models

Byoung-Tak Zhang

Biointelligence Lab

Seoul National University

1

Page 2: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

기계학습이란?

• 학습시스템:

– 환경 E와의상호작용으로부터획득한경험적인데이터 D를바탕으로모델 M을자동으로구성하여스스로성능 P를향상하는시스템

• Self-improving Systems (인공지능관점)

• Knowledge Discovery (데이터마이닝관점)

• Data-Driven Software Design (소프트웨어공학관점)

• Automatic Programming (컴퓨터공학관점)

2

Page 3: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Machine Learning as Automatic Programming

3

ComputerData

ProgramOutput

ComputerData

Output

Program

Traditional Programming

Machine Learning

Page 4: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Machine Learning (ML): Three Tasks

4

• Supervised Learning– Estimate an unknown mapping from known input and target output pairs– Learn fw from training set D = {(x,y)} s.t.– Classification: y is discrete– Regression: y is continuous

• Unsupervised Learning– Only input values are provided– Learn fw from D = {(x)} s.t.– Density estimation and compression– Clustering, dimension reduction

• Sequential (Reinforcement) Learning– Not target, but rewards (critiques) are provided “sequentially”– Learn a heuristic function fw from Dt = {(st,at,rt) | t = 1, 2, …} s.t.– With respect to the future, not just past– Sequential decision-making– Action selection and policy learning

)()( xxw fyf

xxw )(f

( , , )t t tf a rw s

Page 5: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

기계학습모델

• 감독학습모델– Neural Nets

– Decision Trees

– K-Nearest Neighbors

– Support Vector Machines

• 무감독학습모델– Self-Organizing Maps

– Clustering Algorithms

– Manifold Learning

– Evolutionary Learning

• 확률그래프모델– Bayesian Networks

– Markov Networks

– Hidden Markov Models

– Hypernetworks

• 동적시스템모델– Kalman Filters

– Sequential Monte Carlo

– Particle Filters

– Reinforcement Learning

5

Page 6: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Outline

• Bayesian Inference– Monte Carlo– Importance Sampling– MCMC

• Probabilistic Graphical Models– Bayesian Networks– Markov Random Fields

• Hypernetworks– Architecture and Algorithms– Application Examples

• Discussion

6

Page 7: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayes Theorem

7

Page 8: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

MAP vs. ML

• What is the most probable hypothesis given data?• From Bayes Theorem

• MAP (Maximum A Posteriori)

• ML (Maximum Likelihood)

8

Page 9: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Inference

9

Page 10: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring) 10

Prof. Schrater’s Lecture Notes

(Univ. of Minnesota)

Page 11: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring) 11

Page 12: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Monte Carlo (MC) Approximation

12

Page 13: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Markov chain Monte Carlo

13

Page 14: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

MC with Importance Sampling

14

Page 15: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Graphical Models

15

Graphical Models (GM)

Causal Models Chain Graphs Other Semantics

Directed GMsDependency Networks Undirected GMs

Bayesian Networks

DBNsFST

HMMs

Factorial HMM MixedMemory Markov Models

BMMs

Kalman

Segment Models

Mixture Models

Decision Trees Simple

Models

PCA

LDA

Markov RandomFields / Markov

networks

Gibbs/BoltzmanDistributions

Page 16: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

BAYESIAN NETWORKS

16

Page 17: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks

• Bayesian network

– DAG (Directed Acyclic Graph)

– Express dependence relations between variables

– Can use prior knowledge on the data (parameters)

17

A B C

D E

n

i

iiXPP1

)|()( paX

P(A,B,C,D,E)

= P(A)P(B|A)P(C|B) P(D|A,B)P(E|B,C,D)

Page 18: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Representing Probability Distributions

• Probability distribution= probability for each combination of values of these attributes

• Naïve representations (such as tables) run into troubles– 20 attributes require more than 220 106 parameters

– Real applications usually involve hundreds of attributes

18

Hospital patients described by

• Background: age, gender, history of diseases, …

• Symptoms: fever, blood pressure, headache, …

• Diseases: pneumonia, heart attack, …

Page 19: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks - Key Idea

• utilize conditional independence

• Graphical representation of conditional independence respectively “causal” dependencies

19

Exploit regularities !!!

Page 20: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks

1. Finite, directed acyclic graph

2. Nodes: (discrete) random variables

3. Edges: direct influences

4. Associated with each node: a table representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node

20

MJ

E B

A

Page 21: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks

21

X1 X2

X3(0.2, 0.8) (0.6, 0.4)

true 1 (0.2,0.8)

true 2 (0.5,0.5)

false 1 (0.23,0.77)

false 2 (0.53,0.47)

Page 22: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Use a DAG to model the causality

22

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Normanuntidy

Page 23: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Attach prior probabilities to all root nodes

23

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

MartinOversleep ProbabilityT 0.01F 0.99

TrainStrike ProbabilityT 0.1F 0.9

NormanOversleep ProbabilityT 0.2F 0.8

Boss failure-in-love ProbabilityT 0.01F 0.99

Normanuntidy

Page 24: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Attach prior probabilities to non-root nodes

24

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Norman oversleepT F

Norman

untidy

T 0.6 0.2

F 0.4 0.8

Train strike

T F

Martin oversleep

T F T F

Martin Late

T 0.95 0.8 0.7 0.05

F 0.05 0.2 0.3 0.95

Normanuntidy

Each column is summed to 1.

Page 25: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Attach prior probabilities to non-root nodes

25

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Normanuntidy

Each column is summed to 1.

Boss Failure-in-love

T F

Project Delay

T F T F

Office Dirty

T F T F T F T F

Boss Angry

very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01

mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02

little 0 0 0.1 0.25 0.2 0.3 0.7 0.07

no 0 0 0 0 0 0 0.1 0.9

Page 26: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring) 26

Inference

Page 27: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

MARKOV RANDOM FIELDS (MARKOV NETWORKS)

27

Page 28: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Graphical Models

28

Directed Graph(e.g. Bayesian Network)

Undirected Graph(e.g. Markov Random Field)

Page 29: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Image Analysis

29

Original Image Degraded (observed) Image

Noise

Transmission

Likelihood Marginal

yProbabilit PrioriA Processn Degradatio

yProbabilit PosterioriA Image Degraded

Image OriginalImage OriginalImage DegradedImage DegradedImage Original

Pr

PrPr Pr

Page 30: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Image Analysis

• We could thus represent both the observed image (X) and the true image (Y) as Markov random fields.

• And invoke the Bayesian framework to find P(Y|X)

30

X – observed image

Y – true image

Page 31: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Details

• Remember

• P(Y|X) proportional to P(X|Y)P(Y)– P(X|Y) is the data model.

– P(Y) models the label interaction.

• Next we need to compute the prior P(Y=y) and the likelihood P(X|Y).

31

P(Y | X) =P(X |Y )P(Y )

P(X)µP(X |Y )P(Y )

Page 32: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Back to Image Analysis

32

Likelihood can be modeled as a mixture of Gaussians.

The potential is modeled to capture the domain knowledge. One common model is the Isingmodel of the form βyiyj

Page 33: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Image Analysis

• Let X be the observed image = {x1,x2…xmn}

• Let Y be the true image = {y1,y2…ymn}

• Goal : find Y = y* = {y1*,y2*…} such that P(Y = y*|X) is maximum.

• Labeling problem with a search space of Lmn

– L is the set of labels.

– m*n observations.

33

Page 34: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Unfortunately

34

Observed Image SVM MRF

Page 35: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Markov Random Fields (MRFs)

• Introduced in the 1960s, a principled approach for incorporating context information.

• Incorporating domain knowledge .

• Works within the Bayesian framework.

• Widely worked on in the 70s, disappeared over the 80s, and finally made a big come back in the late 90s.

35

Page 36: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Markov Random Field

• Random Field: Let be a family of random variables defined on the set S , in which each random variable … takes a value in a label set L. The family F is called a random field.

• Markov Random Field: F is said to be a Markov random field on S with respect to a neighborhood system N if and only if the following two conditions are satisfied:

},...,,{ 21 MFFFF

iF if

Positivity: ( ) 0,P f f F

)|(}){|( :tyMarkovianiiNii ffPiSfP

36

Page 37: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Inference

• Finding the optimal y* such that P(Y=y*|X) is maximum.

• Search space is exponential.

• Exponential algorithm - simulated annealing (SA)

• Greedy algorithm – iterated conditional modes (ICM)

• There are other more advanced graph cut based strategies.

37

Page 38: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Sampling and Simulated Annealing

• Sampling– A way to generate random samples from a (potentially very

complicated) probability distribution.

– Gibbs/Metropolis.

• Simulated annealing– A schedule for modifying the probability distribution so that, at “zero

temperature”, you draw samples only from the MAP solution.

• If you can find the right cooling schedule the algorithm will converge to a global MAP solution.

• Flip side --- SLOW finding the correct schedule is non trivial.

38

Page 39: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Iterated Conditional Modes

• Greedy strategy, fast convergence

• Idea is to maximize the local conditional probabilities iteratively, given an initial solution.

• Simulated annealing with T =0 .

39

Page 40: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Parameter Learning

• Supervised learning (easiest case)

• Maximum likelihood:

• For an MRF: ( | )/1( | )

( )

U f TP f eZ

* arg max ( | )P f

40

Page 41: 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence (2015-Spring)4190.408 2015-Spring Bayesian Networks –3, 4Inference with Probabilistic

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Pseudo Likelihood

• So we approximate

• Large lattice theorem: in the large lattice limit M, PL converges to ML estimate.

• Turns out that a local learning method like pseudo-likelihood when combined with a local inference method such as ICM does quite well. Close to optimal results.

( , )

( , )( ) ( | ) =

i i Ni

i j j N j

j

U f f

i N U f fi X

f L

ePL f P f f

e

( ) ( , )ii i N

i

U f U f f

41