characterizing model errors and differences stephen d. bay and michael j. pazzani information and...

Characterizing Model Errors and Differences

Stephen D. Bay and Michael J. Pazzani

Information and Computer Science

University of California, Irvine{sbay,pazzani}@ics.uci.edu

Evaluation Tools

• loss/accuracy

• confusion matrices

• ROC curves

• Kappa statistic (Cohen, 1960)

Problem: Cannot answer questions like

• “On which types of examples is my classifier most and least accurate?”

• “What are the differences between these two classifiers given that they have the same accuracy?’’

)(

Adult data set

• Census database– 48000 examples– 12 demographic variables– classification task: predict salary >$50K or $50K– C5 accuracy ~85%

• available from UCI Machine Learning Repository (Blake & Merz) http://www.ics.uci.edu/~mlearn/MLRepository.html

Our Goal• Characterize model errors or model differences in

the feature space of the problem

Examples:

Classifier MC4 is 21% less accurate than average on people who are between 45 and 55 years of age, are high school graduates, and are married. This represents 115 misclassified instances.

MC4 and naive Bayes are 9% less likely to agree than average on people who have Masters degrees and are married. This represents 50 instances with different predictions.

MC4 is a C4.5 clone (Kohavi, Sommerfield & Dougherty, 1997)

Framework• Simple meta-learning framework

Age Sex Occupation Salary Agree

34 M Tech-Support >$50K 0

49 F Prof-Specialty >$50K 1

24 F Exec-Managerial $50K 0

57 M Admin-Clerical $50K 1

MErr: does the model agree with the true class labels?MDiff: do two models agree with each other?

Exploratory Research(Dietterich, 1996)

• new task: generating descriptive rule sets for model errors and differences

• existing solutions do not work well– (i.e. although C5 is a very good classifier it is

not appropriate for this task)

• qualitative and quantitative results

• define criteria for measuring quality of results

C5

$50K >$50K

divorced

never married

salary

marital status capital gains

agree=1 agree=1

married$3500 >$3500

capital gains agree=1education

STUCCO

)|()|( 21 cyPcyP XX

Two stages:• search• summarization

such that all Find X

Bay, S.D. & Pazzani, M.J. (1999). Detecting change in categorical data: Mining contrast sets. In Proceedings of theFifth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining.

Let be a conjunction of attribute-value pairs such as occupation=sales or sex=female ^ age = [45,55]

X

Discriminative vs. Characteristic Learning

• Classifiers can be broadly classified as discriminative or characteristic (Rubinstein & Hastie, 1997)

• normally given select class so that is maximized

)|( XyP )|( yP X

)(

)()|()|(

X

XX

P

yPyPyP

discriminative characteristic

Bayes Rule:

)|( XyPX

C5 vs. STUCCO

• discriminative vs. characteristic

• incomplete vs. complete

• unordered vs. hierarchical rule sets

Leads to very different rule sets

Rule Set Examples, C5

Modelaccuracy

Effectsize

Never married AND capitalgains = 0 AND salary > $50K

-84.8% -145.1

Divorced AND capital gains = 0AND salary > $50K

-84.8% -122.2

Education = Bachelors ANDmarried AND occupation =Sales AND capital gains = 0 andsalary < $50K

-63.9% -42.8

MC4 Errors on Adult

Rule Set Examples, STUCCO

Modelaccuracy

EffectSize

occupation = Adm-clerical +4.6% +84.3

marital status = married -12.5% -924.9

occupation = Adm-clericalAND marital status = married

-17.4% -88.8

MC4 Errors on Adult

Practical Differences

MC4 is 6% more accurate than average on people who have a Bachelors degree, are married, work in a professional specialty, reported a capital gain of $0, and have a salary > $50K. This represents 13 correctly classified instances.

MC4 is 26% less accurate than average on people who have a salary > $50K. This represents 1013 misclassified instances.

MC4 is 13% less accurate than average on people who are married. This represents 925 misclassified instances.

C5 has a fragmentation problem

C5 is incomplete and misses the following rules

Evaluation

Queries:• MC4 Errors• 1NN vs. 5NN• Naïve Bayes vs. SuperParent (Keogh & Pazzani,

1999)

Criteria:• substantial effect• comprehensible• stable

ResultsMC4 Errors 1NN vs. 5NN NB. vs. SPC5 S C5 S C5 S

Number of rules 52 143 685 123 46 192

Median effect size 44 148 2 64 38 115Average rule size 2.9 2.0 3.8 2.1 2.6 2.6

Stability 0.25 0.54 0.05 0.48 0.24 0.40

YX

YXY)X,agreement(

Stability: expected agreement between rule sets generated fromthe same distribution.

Effect Size: if we could make the agreement the same as the average, how many examples would be affected?

Stability, MC4 Errors

0 0.2 0.4 0.6 0.860

80

100

120

140

160

180

Agreement

Siz

e of

Un

ion

C5 STUCCO

Stability, 1NN vs. 5NN

0 0.2 0.4 0.6 0.80

200

400

600

800

Agreement

Siz

e of

Un

ion

C5 STUCCO

Stability, NB vs SP

0 0.2 0.4 0.6 0.80

50

100

150

200

Agreement

Siz

e of

Un

ion

C5 STUCCO

Accuracy Difference vs. Effect Size

-0.8 -0.6 -0.4 -0.2 0 0.2-1500

-1000

-500

0

500

1000

1500

Accuracy Difference

Eff

ect

Siz

e

C5 STUCCO

Summary

• Can treat problem of characterizing model performance as a meta-learning problem

• may require a different bias from discriminative learners

• other factors important beyond validity of rules

Future Work

• generalize to loss

• investigate how to summarize rules for humans

• classifier comparisons– single vs. multiple models– comparing ensemble methods

Set-Enumeration Search{}

{1} {2} {3} {4}

{1,2} {1,3} {1,4} {2,3} {2,4}

{1,2,3} {1,2,4} {1,3,4} {2,3,4}

{1,2,3,4}

{3,4}

(Rymon, 1992)

Rule Summarization

• Rules are summarized hierarchically to present only surprising findings

CA

CB

CBA

given

when do we show

Iterative Process of Building Machine Learning Systems

reprinted with permission from

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996) The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39, 27-34.

Copyright 1996 ACM

characterizing model errors and differences stephen d. bay and michael j. pazzani information and...

Documents

goalcharacterize model

model differences

capital gains

c5mc4 errors

stuccomc4 errors

descriptive rule sets

data mining

hierarchical rule setsleads