characterizing model errors and differences stephen d. bay and michael j. pazzani information and...

24
Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Upload: rodger-eaton

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Characterizing Model Errors and Differences

Stephen D. Bay and Michael J. Pazzani

Information and Computer Science

University of California, Irvine{sbay,pazzani}@ics.uci.edu

Page 2: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Evaluation Tools

• loss/accuracy

• confusion matrices

• ROC curves

• Kappa statistic (Cohen, 1960)

Problem: Cannot answer questions like

• “On which types of examples is my classifier most and least accurate?”

• “What are the differences between these two classifiers given that they have the same accuracy?’’

)(

Page 3: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Adult data set

• Census database– 48000 examples– 12 demographic variables– classification task: predict salary >$50K or $50K– C5 accuracy ~85%

• available from UCI Machine Learning Repository (Blake & Merz) http://www.ics.uci.edu/~mlearn/MLRepository.html

Page 4: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Our Goal• Characterize model errors or model differences in

the feature space of the problem

Examples:

Classifier MC4 is 21% less accurate than average on people who are between 45 and 55 years of age, are high school graduates, and are married. This represents 115 misclassified instances.

MC4 and naive Bayes are 9% less likely to agree than average on people who have Masters degrees and are married. This represents 50 instances with different predictions.

MC4 is a C4.5 clone (Kohavi, Sommerfield & Dougherty, 1997)

Page 5: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Framework• Simple meta-learning framework

Age Sex Occupation Salary Agree

34 M Tech-Support >$50K 0

49 F Prof-Specialty >$50K 1

24 F Exec-Managerial $50K 0

57 M Admin-Clerical $50K 1

MErr: does the model agree with the true class labels?MDiff: do two models agree with each other?

Page 6: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Exploratory Research(Dietterich, 1996)

• new task: generating descriptive rule sets for model errors and differences

• existing solutions do not work well– (i.e. although C5 is a very good classifier it is

not appropriate for this task)

• qualitative and quantitative results

• define criteria for measuring quality of results

Page 7: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

C5

$50K >$50K

divorced

never married

salary

marital status capital gains

agree=1 agree=1

married$3500 >$3500

capital gains agree=1education

Page 8: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

STUCCO

)|()|( 21 cyPcyP XX

Two stages:• search• summarization

such that all Find X

Bay, S.D. & Pazzani, M.J. (1999). Detecting change in categorical data: Mining contrast sets. In Proceedings of theFifth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining.

Let be a conjunction of attribute-value pairs such as occupation=sales or sex=female ^ age = [45,55]

X

Page 9: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Discriminative vs. Characteristic Learning

• Classifiers can be broadly classified as discriminative or characteristic (Rubinstein & Hastie, 1997)

• normally given select class so that is maximized

)|( XyP )|( yP X

)(

)()|()|(

X

XX

P

yPyPyP

discriminative characteristic

Bayes Rule:

)|( XyPX

Page 10: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

C5 vs. STUCCO

• discriminative vs. characteristic

• incomplete vs. complete

• unordered vs. hierarchical rule sets

Leads to very different rule sets

Page 11: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Rule Set Examples, C5

Modelaccuracy

Effectsize

Never married AND capitalgains = 0 AND salary > $50K

-84.8% -145.1

Divorced AND capital gains = 0AND salary > $50K

-84.8% -122.2

Education = Bachelors ANDmarried AND occupation =Sales AND capital gains = 0 andsalary < $50K

-63.9% -42.8

MC4 Errors on Adult

Page 12: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Rule Set Examples, STUCCO

Modelaccuracy

EffectSize

occupation = Adm-clerical +4.6% +84.3

marital status = married -12.5% -924.9

occupation = Adm-clericalAND marital status = married

-17.4% -88.8

MC4 Errors on Adult

Page 13: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Practical Differences

MC4 is 6% more accurate than average on people who have a Bachelors degree, are married, work in a professional specialty, reported a capital gain of $0, and have a salary > $50K. This represents 13 correctly classified instances.

MC4 is 26% less accurate than average on people who have a salary > $50K. This represents 1013 misclassified instances.

MC4 is 13% less accurate than average on people who are married. This represents 925 misclassified instances.

C5 has a fragmentation problem

C5 is incomplete and misses the following rules

Page 14: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Evaluation

Queries:• MC4 Errors• 1NN vs. 5NN• Naïve Bayes vs. SuperParent (Keogh & Pazzani,

1999)

Criteria:• substantial effect• comprehensible• stable

Page 15: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

ResultsMC4 Errors 1NN vs. 5NN NB. vs. SPC5 S C5 S C5 S

Number of rules 52 143 685 123 46 192

Median effect size 44 148 2 64 38 115Average rule size 2.9 2.0 3.8 2.1 2.6 2.6

Stability 0.25 0.54 0.05 0.48 0.24 0.40

YX

YXY)X,agreement(

Stability: expected agreement between rule sets generated fromthe same distribution.

Effect Size: if we could make the agreement the same as the average, how many examples would be affected?

Page 16: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Stability, MC4 Errors

0 0.2 0.4 0.6 0.860

80

100

120

140

160

180

Agreement

Siz

e of

Un

ion

C5 STUCCO

Page 17: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Stability, 1NN vs. 5NN

0 0.2 0.4 0.6 0.80

200

400

600

800

Agreement

Siz

e of

Un

ion

C5 STUCCO

Page 18: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Stability, NB vs SP

0 0.2 0.4 0.6 0.80

50

100

150

200

Agreement

Siz

e of

Un

ion

C5 STUCCO

Page 19: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Accuracy Difference vs. Effect Size

-0.8 -0.6 -0.4 -0.2 0 0.2-1500

-1000

-500

0

500

1000

1500

Accuracy Difference

Eff

ect

Siz

e

C5 STUCCO

Page 20: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Summary

• Can treat problem of characterizing model performance as a meta-learning problem

• may require a different bias from discriminative learners

• other factors important beyond validity of rules

Page 21: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Future Work

• generalize to loss

• investigate how to summarize rules for humans

• classifier comparisons– single vs. multiple models– comparing ensemble methods

Page 22: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Set-Enumeration Search{}

{1} {2} {3} {4}

{1,2} {1,3} {1,4} {2,3} {2,4}

{1,2,3} {1,2,4} {1,3,4} {2,3,4}

{1,2,3,4}

{3,4}

(Rymon, 1992)

Page 23: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Rule Summarization

• Rules are summarized hierarchically to present only surprising findings

CA

CB

CBA

given

when do we show

Page 24: Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Iterative Process of Building Machine Learning Systems

reprinted with permission from

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996) The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39, 27-34.

Copyright 1996 ACM