building global models from local patterns a.j. knobbe

30
Building Global Models from Local Patterns A.J. Knobbe

Upload: olivia-wilkins

Post on 18-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Building Global Models from Local Patterns

A.J. Knobbe

Feature-continuum

attributes

(constructed) features

patterns

classifiers

target concept

Two-phased process

Break discovery up into two phases Transform complex problem into more simple one

frequent patterns correlated patterns interesting subgroups decision boundaries …

frequent patterns correlated patterns interesting subgroups decision boundaries …

redundancy reduction dependency modeling global model building …

redundancy reduction dependency modeling global model building …

Pattern Discovery phase

Pattern Combinationphase

Pattern Teams pattern networks global predictive

models …

Task: Subgroup Discovery

Subgroup Discovery:

Find subgroups that show substantially different distribution of target concept.

top-down search for patterns inductive constraints (sometimes monotonic) evaluation measures: novelty, X2, information gain also known as rule discovery, correlated pattern mining

Novelty

Also known as weighted relative accuracy Balance between coverage and unexpectedness nov(S,T) = p(ST) – p(S)p(T) between −.25 and .25, 0 means uninteresting

T F

T .42 .13 .55

F .12 .33

.54 1.0

nov(ST) = p(ST)−p(S)p(T)= .42 − .297 = .123

subgroup

target

Demo Subgroup Discovery

redundancy exists in set oflocal patterns

Demo Subgroup Discovery

0

50

100

150

200

250

300

350

400

450

500

1 335 669 1003 1337 1671 2005 2339 2673 3007 3341 3675 4009 4343 4677

Pattern Combination phase

Feature selection, redundancy reduction– Pattern Teams

Dependency modeling– Bayesian networks– Association rules

Global modeling– Classifiers, regression models

Pattern Teams & Pattern Networks

Pattern Teams

Pattern Discovery typically produces very many patterns with high levels of redundancy

Report small informative subset with specific properties

Promote dissimilarity of patterns reported Additional value of individual patterns Consider extent of patterns

– Treat patterns as binary features/items

Intuitions

No two patterns should cover same set of examples No pattern should cover complement of another pattern No pattern should cover logical combination of two or

more other patterns Patterns should be mutually exclusive The pattern set should lead to the best performing

classifier Patterns should lie on convex hull in ROC-space

Quality measures for pattern sets

Judge pattern sets on the basis quality function

Joint Entropy (miki) Exclusive Coverage Wrapper accuracy Area Under Curve in ROC-space Bayesian Dirichlet equivalent uniform

unsupervised

supervised

Pattern Teams

-1

0

1

2

3

4

5

6

7

8

9

-4 -3,5 -3 -2,5 -2 -1,5 -1 -0,5 0

-1

0

1

2

3

4

5

6

7

8

9

-4 -3,5 -3 -2,5 -2 -1,5 -1 -0,5 0

82 subgroups discovered 4 subgroups in pattern team

Pattern Network

Again, treat patterns as binary features Bayesian networks

– conditional independence of patterns

Explain relationships between patterns Explain role of patterns in Pattern Team

Demo Pattern Team & Network

redundancy removed to find truly divers patterns,in this case using maximization of joint entropy

Demo Pattern Team & Network

peak around 89k

peak around 16k

peak around 39k

pattern team, and related patterns can be presented in a bayesian network

Properties of SD phase in PC

What knowledge about Subgroup Discovery parameters can be exploited in Combination?

Interestingness– Are interesting subgroups diverse?– Are interesting subgroups correlated?

Information content Support of patterns

joint entropy of 2 interesting subgroups

0

0.5

1

1.5

2

2.5

0 0.05 0.1 0.15 0.2 0.25

subgroups are very novel, 1 bit of information

subgroups are relatively novel, up to 2 bits of information

correlation of interesting subgroups

novelty

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0 0.05 0.1 0.15 0.2 0.25

novelty of subgroups

inte

r n

ov

elt

y

subgroups are novel, but potentially independent

subgroups are very novel, and correlate

Building Classifiers from Local Patterns

Combination strategies

How to interpret a pattern set? Conjunctive (intersection of patterns) Disjunctive (union of patterns) Majority vote (equal weight linear separator) … Contingencies/Classifiers

Decision Table Majority (DTM)

Treat every truth-assignment as contingency Classification based on conditional probability Use majority class for empty contingencies Only works with Pattern Team (else overfitting)

Support Vector Machine (SVM)

SVM with linear kernel Binary data All dimensions have same scale Works with large pattern sets Subgroup discovery has removed XOR-like

dependencies Interesting subgroups correlate

XOR-like dependencies

XOR-like dependencies

p2

p1

XOR-like dependencies

p2

p1

(0,0) (1,0)

(0,1) (1,1)

Division of labour between 2 phases

Subgroup Discovery Phase– Feature selection– Decision boundary finding/thresholding– Multivariate dependencies (XOR)

Pattern Combination Phase– Pattern selection– Combination (XOR?)– Class assignment

Combination-aware Subgroup Discovery

Better global model Superficially uninteresting patterns can be

reported pruning of search space (new rule-measures)

subgroups are not novel, team is optimal

Combination-aware Subgroup Discovery

Subgroup Discovery ++:

Find a set of subgroups that show substantially different distribution of target concept.

Considerations – support of pattern– diversity of pattern– …

Conclusions

Less hasty approach to model building Interesting patterns serve two purposes

– understandable knowledge– building blocks of global model

Pattern discovery without combination limited Information exchange between phases Integration of two phases non-trivial