parameter related domain knowledge for learning in bayesian networks

36
1 Parameter Related Domain Knowledge for Learning in Bayesian Networks Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005

Upload: deiter

Post on 12-Jan-2016

26 views

Category:

Documents


3 download

DESCRIPTION

Parameter Related Domain Knowledge for Learning in Bayesian Networks. Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005. Domain Knowledge. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parameter Related Domain Knowledge for Learning in Bayesian Networks

1

Parameter Related Domain Knowledge for

Learning in Bayesian Networks

Stefan NiculescuPhD Candidate, Carnegie Mellon University

Joint work with professor Tom Mitchell and Dr. Bharat Rao

April 2005

Page 2: Parameter Related Domain Knowledge for Learning in Bayesian Networks

2

Domain Knowledge

• In real world, often data is too sparse to allow building of an accurate model

• Domain knowledge can help alleviate this problem

• Several types of domain knowledge:– Relevance of variables (feature selection) – Conditional Independences among variables– Parameter Domain Knowledge

Page 3: Parameter Related Domain Knowledge for Learning in Bayesian Networks

3

Parameter Domain Knowledge

• In a Bayes Net for a real world domain:– can have huge number of parameters– not enough data to estimate them accurately

• Parameter Domain Knowledge constraints: – reduce the number of parameters to estimate– reduce the variance of parameter estimates

Page 4: Parameter Related Domain Knowledge for Learning in Bayesian Networks

4

Outline

• Motivation

Parameter Related Domain Knowledge

• Experiments

• Related Work

• Summary / Future Work

Page 5: Parameter Related Domain Knowledge for Learning in Bayesian Networks

5

Parameters and Counts

CPT for variable Xi

Theorem. The Maximum Likelihood estimators are given by:

Page 6: Parameter Related Domain Knowledge for Learning in Bayesian Networks

6

Parameter Sharing

1g

11lc

12lc

13lc

1 11

1c 3c2c

1g1g Theorem. The Maximum Likelihood

estimators are given by:

21lc

22lc

23lc

2g 2g 2g32lc

},,{ 3211 cccC

Page 7: Parameter Related Domain Knowledge for Learning in Bayesian Networks

7

Incomplete Data, Frequentist

Page 8: Parameter Related Domain Knowledge for Learning in Bayesian Networks

8

Dependent Dirichlet Priors

Page 9: Parameter Related Domain Knowledge for Learning in Bayesian Networks

9

Bayesian Averaging

Page 10: Parameter Related Domain Knowledge for Learning in Bayesian Networks

10

Hierarchical Parameter Sharing

1

1

1 11 11

11 1 1 1

2 2 2

3 3 3

44 4

5 5

Page 11: Parameter Related Domain Knowledge for Learning in Bayesian Networks

11

Probability Mass Sharing

DK: Parameters of a given color have the same sum across all distributions.

11 1

k5

...

1211 k1

21 k222

51 5241 k442

kk 2122122111 ...

Page 12: Parameter Related Domain Knowledge for Learning in Bayesian Networks

12

Probability Ratio Sharing

11 1

k5

DK: Parameters of a given color preserve their relative ratios across all distributions.

...

k

k

2

1

22

12

21

11 ...

1211 k1

21 k222

51 5241 k442

Page 13: Parameter Related Domain Knowledge for Learning in Bayesian Networks

13

Where are we right now?

Page 14: Parameter Related Domain Knowledge for Learning in Bayesian Networks

14

Outline

• Motivation

• Parameter Related Domain Knowledge

Experiments

• Related Work

• Summary / Future Work

Page 15: Parameter Related Domain Knowledge for Learning in Bayesian Networks

15

Datasets

• Project World - CALO– 6 persons, ~ 200 emails – Manually labeled as About / Not About Meetings– Data: (Person, Email, Topic)

• Artificial Datasets– Kept most of the characteristics of the data BUT ...– ... new emails were generated where frequencies of

certain words were shared across users– Purpose:

• Domain Knowledge readily available• To be able to study the effect of training set size (up to 5000)• To be able to compare our estimated distribution to the true

distribution

Page 16: Parameter Related Domain Knowledge for Learning in Bayesian Networks

16

Approach

• Can model Email using a Naive Bayes model:– Without Parameter Sharing (PSNB)

– With Parameter Sharing (SSNB)

• Also compare with a model that assumes the sender is irrelevant (GNB)– the frequencies of words within a

topic to be learnt from all examples

Sender

Word

TopicSender

Word

Topic

Page 17: Parameter Related Domain Knowledge for Learning in Bayesian Networks

17

Effect of Training Set Size

As expected:

• SSNB performs better than both models

• SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse

Page 18: Parameter Related Domain Knowledge for Learning in Bayesian Networks

18

Outline

• Motivation

• Parameter Related Domain Knowledge

• Experiments

Related Work

• Summary / Future Work

Page 19: Parameter Related Domain Knowledge for Learning in Bayesian Networks

19

Dirichlet Priors in a Bayes Net

Prior Belief

Spread

The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)

Page 20: Parameter Related Domain Knowledge for Learning in Bayesian Networks

20

HMMs and DBNs

1tX 1tXtX

1tY tY 1tY

... ...

... ...

Page 21: Parameter Related Domain Knowledge for Learning in Bayesian Networks

21

Module NetworksIn a Module:

• Same parents

• Same CPTs

Image from “Learning Module Networks” by Eran Segal and Daphne Koller

Page 22: Parameter Related Domain Knowledge for Learning in Bayesian Networks

22

Context Specific Independence

Alarm

Set Burglary

Page 23: Parameter Related Domain Knowledge for Learning in Bayesian Networks

23

Outline

• Motivation

• Parameter Related Domain Knowledge

• Experiments

• Related Work

Summary / Future Work

Page 24: Parameter Related Domain Knowledge for Learning in Bayesian Networks

24

Summary

• Parameter Related Domain Knowledge is needed when data is scarce

• Developed methods to estimate parameters:

– For each of four types of Domain Knowledge presented

– From both complete and incomplete Data

• Markov Models, Module Nets, Context Specific Independence – particular

cases of our parameter sharing domain knowledge

• Models using Parameter Sharing performed better than two classical Bayes

Nets on synthetic data

Page 25: Parameter Related Domain Knowledge for Learning in Bayesian Networks

25

Future Work

• Automatically find Shared Parameters

• Study interactions among different types of Domain Knowledge

• Incorporate Domain Knowledge about continuous variables

• Investigate Domain Knowledge in the form of inequality constraints

Page 26: Parameter Related Domain Knowledge for Learning in Bayesian Networks

26

Questions ?

Page 27: Parameter Related Domain Knowledge for Learning in Bayesian Networks

27

THE END

Page 28: Parameter Related Domain Knowledge for Learning in Bayesian Networks

28

Backup Slides

Page 29: Parameter Related Domain Knowledge for Learning in Bayesian Networks

29

Hierarchical Parameter Sharing

}{},,,,,,{ 1654321 cccccc

},{},,,{ 32321 ccc }{},,,{ 4654 ccc

}{},,{ 521 cc

{...}},{ 1c {...}},{ 2c

{...}},{ 3c {...}},{ 4c {...}},{ 5c {...}},{ 6c

Page 30: Parameter Related Domain Knowledge for Learning in Bayesian Networks

30

Full Data Observability, Frequentist

Page 31: Parameter Related Domain Knowledge for Learning in Bayesian Networks

31

Probability Mass Sharing

21

1

)|(1 EnglishWordPc

)|(2 SpanishWordPc

11

1

12 2

2

1415

2425

•Want to model P(Word|Language)

•Two languages: English, Spanish

•Different sets of words

•Domain Knowledge:

•Aggregate Probability Mass of Nouns the same in both

•Same holds for adjectives, verbs, etc

NounsT 1

VerbsT 2

Page 32: Parameter Related Domain Knowledge for Learning in Bayesian Networks

32

Probability Mass Sharing

Page 33: Parameter Related Domain Knowledge for Learning in Bayesian Networks

33

Full Data Observability, Frequentist

Page 34: Parameter Related Domain Knowledge for Learning in Bayesian Networks

34

Probability Ratio Sharing

21

1

)|(1 EnglishWordPc

)|(2 SpanishWordPc

11

1

12 2

2

1415

2425

•Want to model P(Word|Language)

•Two languages: English, Spanish

•Different sets of words

•Domain Knowledge:

•Word groups:

•About computers: computer, mouse, monitor, etc

•Relative frequency of “computer” to “mouse” same in both languages

•Aggregate mass can be different

T1 Computer Words

T2 Business Words

Page 35: Parameter Related Domain Knowledge for Learning in Bayesian Networks

35

Probability Ratio Sharing

Page 36: Parameter Related Domain Knowledge for Learning in Bayesian Networks

36

Full Data Observability, Frequentist