analysis and visualization of classifier performance comparison under imprecise class and cost...

18
F. Provost and T. Fawcett

Upload: agatha

Post on 25-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

F. Provost and T. Fawcett. Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS. Ramazan Bitirgen CSL - ECE. Confusion Matrix. Introduction. Data mining requires: Experiments with a wide variety of learning algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

F. Provost and T. Fawcett

Page 2: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Confusion Matrix

2Bitirgen - CS678

Page 3: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Introduction Data mining requires:

Experiments with a wide variety of learning algorithms

Using different algorithm parametersVarying output threshold valuesUsing different training regimens

Using accuracy alone is inadequate because:Class distributions are skewedMisclassification (FP, FN) costs are not uniform

3Bitirgen - CS678

Page 4: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Class Distributions -Problems with Acc. … assumes that class distribution among

examples is constant and relatively balanced (-which is not the case in real life-)

Classifiers are generally used to scan ‘large number of normal entities’ to find ‘small number of unusual ones’Looking for defrauded customersChecking an assembly lineSkews of 106 were reported (Clearwater & Stern 1991)

4Bitirgen - CS678

Page 5: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Misclassification Costs -Problems with Acc. ‘Equal error costs’ does not hold in real

life problemsDisease tests, fraud detection…

Instead of maximizing the accuracy, we need to minimize the error cost.

Cost = FP • c(Y,n) + FN • c(N,p)

5Bitirgen - CS678

Page 6: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

6Bitirgen - CS678

Page 7: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

ROC Plot and ROC Area Receiver Operator Characteristic Developed in WWII to statistically model

“false positive” and “false negative” detections of radar operators

Becoming more popular in ML and standard measure in medicine and biology

However does poor job on deciding the choice of classifiers

7Bitirgen - CS678

Page 8: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

ROC graph of four classifiers

Informally a point in ROC space is better than the other if it is to the northwest.

8Bitirgen - CS678

Page 9: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

9Bitirgen - CS678

Page 10: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Iso-performance Lines Expected cost of a classification by a classifier (FP,TP):

Therefore, two points havethe same performance if

Iso-perf. line: All classifierscorresponding to points on the line have the same expected cost.

10Bitirgen - CS678

Page 11: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

ROC Convex Hull If a point is not on the

convex hull the classifier represented by that point cannot be optimal.

In this example B and D cannot be optimal because none or their points are on the convex hull.

11Bitirgen - CS678

Page 12: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

How to use the ROC Convex Hull

p(n):p(p) = 10:1 Scenario A:

c(N,p) = c(Y,n) m(iso_perf) = 10

Scenario B: c(N,p) = 100 • c(Y,n) m(iso_perf) = 0.1

12Bitirgen - CS678

Page 13: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Adding New Classifiers Adding new classifiers

may or may not extend the existing hull.

E may be optimal under some circumstances since it extends the hull

F and G cannot be optimal

13Bitirgen - CS678

Page 14: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

What if distributions & costs are unknown? ROC convex hull gives

us an idea about all classifiers that may be optimal under any conditions.

With complete information the method identifies the optimal classifiers.

In between ?

14Bitirgen - CS678

Page 15: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Sensitivity Analysis Imprecise distribution info

defines a range of slopes for iso-perf lines.

p(n):p(p) = 10:1 Scenario C:

○ $5 < c(Y,n) < $10○ $500 < c(N,p) < $1000○ 0.05 < m(iso_perf) < 0.2

15Bitirgen - CS678

Page 16: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Sensitivity Analysis - 2 Imprecise distribution info

defines a range of slopes for iso-perf lines.

p(n):p(p) = 10:1 Scenario D:

○ 0.2 < m(iso_perf) < 2

16Bitirgen - CS678

Page 17: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Sensitivity Analysis - 3 Can “do nothing” strategy

be better than any of the available classifiers?

17Bitirgen - CS678

Page 18: Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS

Conclusion Accuracy alone as a performance metric is

incapable for various reasons ROC plots give more accurate information about

the performance of classifiers ROC convex hull method

Is an efficient solution to the problem of comparing multiple classifiers in imprecise environments

Allows us to incorporate new classifiers easilyAllows us to select the classifiers that are potentially

optimal

18Bitirgen - CS678