parameter related domain knowledge for learning in bayesian networks

Parameter Related Domain Knowledge for

Learning in Bayesian Networks

Stefan NiculescuPhD Candidate, Carnegie Mellon University

Joint work with professor Tom Mitchell and Dr. Bharat Rao

April 2005

Domain Knowledge

• In real world, often data is too sparse to allow building of an accurate model

• Domain knowledge can help alleviate this problem

• Several types of domain knowledge:– Relevance of variables (feature selection) – Conditional Independences among variables– Parameter Domain Knowledge

Parameter Domain Knowledge

• In a Bayes Net for a real world domain:– can have huge number of parameters– not enough data to estimate them accurately

• Parameter Domain Knowledge constraints: – reduce the number of parameters to estimate– reduce the variance of parameter estimates

Outline

• Motivation

Parameter Related Domain Knowledge

• Experiments

• Related Work

• Summary / Future Work

Parameters and Counts

CPT for variable Xi

Theorem. The Maximum Likelihood estimators are given by:

Parameter Sharing

1c 3c2c

1g1g Theorem. The Maximum Likelihood

estimators are given by:

2g 2g 2g32lc

},,{ 3211 cccC

Incomplete Data, Frequentist

Dependent Dirichlet Priors

Bayesian Averaging

Hierarchical Parameter Sharing

1 11 11

11 1 1 1

Probability Mass Sharing

DK: Parameters of a given color have the same sum across all distributions.

1211 k1

21 k222

51 5241 k442

kk 2122122111 ...

Probability Ratio Sharing

DK: Parameters of a given color preserve their relative ratios across all distributions.

11 ...

1211 k1

21 k222

51 5241 k442

Where are we right now?

Outline

• Motivation

• Parameter Related Domain Knowledge

Experiments

• Related Work

Datasets

• Project World - CALO– 6 persons, ~ 200 emails – Manually labeled as About / Not About Meetings– Data: (Person, Email, Topic)

• Artificial Datasets– Kept most of the characteristics of the data BUT ...– ... new emails were generated where frequencies of

certain words were shared across users– Purpose:

• Domain Knowledge readily available• To be able to study the effect of training set size (up to 5000)• To be able to compare our estimated distribution to the true

distribution

Approach

• Can model Email using a Naive Bayes model:– Without Parameter Sharing (PSNB)

– With Parameter Sharing (SSNB)

• Also compare with a model that assumes the sender is irrelevant (GNB)– the frequencies of words within a

topic to be learnt from all examples

Sender

TopicSender

Effect of Training Set Size

As expected:

• SSNB performs better than both models

• SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse

Outline

• Motivation

• Experiments

Related Work

Dirichlet Priors in a Bayes Net

Prior Belief

Spread

The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)

HMMs and DBNs

1tX 1tXtX

1tY tY 1tY

... ...

Module NetworksIn a Module:

• Same parents

• Same CPTs

Image from “Learning Module Networks” by Eran Segal and Daphne Koller

Context Specific Independence

Set Burglary

Outline

• Motivation

• Experiments

• Related Work

Summary / Future Work

Summary

• Parameter Related Domain Knowledge is needed when data is scarce

• Developed methods to estimate parameters:

– For each of four types of Domain Knowledge presented

– From both complete and incomplete Data

• Markov Models, Module Nets, Context Specific Independence – particular

cases of our parameter sharing domain knowledge

• Models using Parameter Sharing performed better than two classical Bayes

Nets on synthetic data

Future Work

• Automatically find Shared Parameters

• Study interactions among different types of Domain Knowledge

• Incorporate Domain Knowledge about continuous variables

• Investigate Domain Knowledge in the form of inequality constraints

Questions ?

THE END

Backup Slides

Hierarchical Parameter Sharing

}{},,,,,,{ 1654321 cccccc

},{},,,{ 32321 ccc }{},,,{ 4654 ccc

}{},,{ 521 cc

{...}},{ 1c {...}},{ 2c

{...}},{ 3c {...}},{ 4c {...}},{ 5c {...}},{ 6c

Full Data Observability, Frequentist

)|(1 EnglishWordPc

)|(2 SpanishWordPc

•Want to model P(Word|Language)

•Two languages: English, Spanish

•Different sets of words

•Domain Knowledge:

•Aggregate Probability Mass of Nouns the same in both

•Same holds for adjectives, verbs, etc

NounsT 1

VerbsT 2

)|(1 EnglishWordPc

)|(2 SpanishWordPc

•Want to model P(Word|Language)

•Two languages: English, Spanish

•Different sets of words

•Domain Knowledge:

•Word groups:

•About computers: computer, mouse, monitor, etc

•Relative frequency of “computer” to “mouse” same in both languages

•Aggregate mass can be different

T1 Computer Words

T2 Business Words

parameter related domain knowledge for learning in bayesian networks

real world domain

domain knowledgein real

incomplete data

psnbwith parameter

assignment of parameters

given color

accurate modeldomain

naive bayes model

Documents

cs 540: machine learning lecture 3: bayesian parameter ......

bayesian estimation of the multifractality parameter for...

bayesian parameter inference of explosive yields using

bayesian inference on the shape parameter and future

bayesian dynamic linear model with adaptive parameter...

lecture 3: bayesian parameter estimates i. › ... ›...

time-domain parameter estimation for fault …

bayesian parameter estimation in bayesian...

pendekatan metode bayesian untuk kajian estimasi parameter

parameter learning in bayesian networks - department of

data-efﬁcient domain randomization with bayesian...

real-time bayesian parameter estimation for item response

efficient algorithms for bayesian network parameter

accurate parameter estimation for bayesian network...

parameter estimation via bayesian inversion: theory...

simultaneous hierarchical bayesian parameter estimation for...

bayesian parameter identification and model selection for

bayesian parameter estimation and variable selection for...

scaling bayesian network parameter learning with hadoop

bayesian parameter estimation in ecolego using an adaptive