bayesian networks for data mining david heckerman microsoft research (data mining and knowledge...

10
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Upload: percival-foster

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Bayesian Networks for Data Mining

David Heckerman

Microsoft Research

(Data Mining and Knowledge Discovery 1, 79-119 (1997))

Page 2: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

The Bayesian approach#1 Question

What is Bayesian probability?

• A person’s degree of belief in certain event.

• Personal (subjective)

• Your degree of belief that the coin will land heads.

Page 3: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

The Classical approach

• Physical property of the world.

• Repeated trials (frequency)

• The probability that a coin will land heads.

Page 4: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

#2 QuestionWhat are the advantages and disadvantages of the Bayesian

and classical interpretation of probability?

Bayesian probability:+ Reflects an expert’s knowledge.+ Compiles with rules of probability- Arbitrary

Classical probability:+ Objective, unbiased.- Not available in most situations.

Page 5: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Bayes Theorem

Posterior = (likelihood X prior) / evidence

)(

)()|()|(

Dp

hphDpDhp

Page 6: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Bayesian Networks

• Graphical model that encodes the joint probability distribution (JPD) for a set of variables X.

• It is a directed acyclic (not cyclic) graph.

• Each node represents one variable and contains a set local probability distributions (LPD) associated with each variable.

Page 7: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Bayesian Networks

• Nodes – Parents– Children

• Conditional probability tables

• Construction

Page 8: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Inference

The computation of a probability of interest given a model is known as

probabilistic inference

P(X|e)=P(x,e)/P(e) = cP(X,e)

Example on board.

Page 9: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Learning

• Learning from data– Refine the structure and LPD of a BN– Combine prior knowledge with data

• Result: IMPROVED KNOWLEDGE

Page 10: Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, 79-119 (1997))

Question #3Mention at least 3 advantages of Bayesian

Networks for data analysis. Explain each one.• Handle incomplete data sets

• Learning about causal relationships

• Combine domain knowledge + data

• Avoid over fitting.