1 formal evaluation techniques chapter 7. 2 test set error rates, confusion matrices, lift charts...

1

Formal Evaluation Techniques

Chapter 7

2

• test set error rates, confusion matrices, lift charts

• Focusing on formal evaluation methods for supervised learning and unsupervised clustering

3

7.1 What Should Be Evaluated?

1. Supervised Model

2. Training Data

3. Attributes

4. Model Builder

5. Parameters

6. Test Set Evaluation

4

ModelBuilder

SupervisedModel EvaluationData

Instances

Attributes

Parameters

Test Data

Training Data

5

Single-Valued Summary Statistics

• Mean

• Variance

• Standard deviation

7.2 Tools for Evaluation

6

-99 -3 -2 -1 0 1 2 3 99

13.54%

34.13%

2.14%

34.13%

13.54%

2.14%

.13%.13%

f(x)

x

The Normal Distribution

7

Normal Distributions and Sample Means

• A distribution of means taken from random sets of independent samples of equal size are distributed normally.

• Any sample mean will vary less than two standard errors from the population mean 95% of the time.

8

Computing the Standard Error

• The population variance is estimated by dividing the sample variance by the

sample size.

• The standard error is computed by taking the square root of the

estimated population variance.

9

Population

Sample 2

Sample 1

X2

X2

X10

X9

X8

X7

X6

X5

X4

X3

X1

X7

X4

X4

X9

X10

Sample 3

X4

X4

X10

10

A Classical Model for Hypothesis Testing

• Hypothesis: educated guess about the outcome of some event

• Experimental group, Control group

• Null hypothesis– There is no significant difference in the mean

increase or decrease of total allergic reactions per day between patients in the group receiving treatment X and patients in the group receiving the placebo.

11

A Classical Model for Hypothesis Testing

sizes. sampleingcorrespondareand

means; respectivetheforscoresvarianceareand

samples;tindependentheformeanssampleareand

and; score cesignifican theis

21

21

21

nn

XX

P

where

vv

)//( 2211

21

nvnv

XXP

To be 95% confident, P must >= 2

12

Table 7.1 • A Confusion Matrix for the Null Hypothesis

Computed Computed Accept Reject

Accept Null True Accept Type 1 ErrorHypothesis

Reject Null Type 2 Error True RejectHypothesis

13

7.3 Computing Test Set Confidence Intervals

instances set test of #

errors set test of # )( e Error RatClassifier E

14

Computing 95% Confidence Intervals

1. Given a test set sample S of size n and error rate E

2. Compute sample variance as V= E(1-E)

3. Compute the standard error (SE) as the square root of V divided by n.

4. Calculate an upper bound error as E + 2(SE)

5. Calculate a lower bound error as E - 2(SE)

15

Three general comments

• The rest data has been randomly chosen from the pool of all possible test set instances

• Test, training, and validation data must represent disjoint sets

• The instances in each class should be distributed in the training, validation, and test data as they are seen in the entire dataset

16

7.4 Comparing Supervised Learner Models

17

Comparing Models with Independent Test Data

where

E1 = The error rate for model M1

E2 = The error rate for model M2

q = (E1 + E2)/2

n1 = the number of instances in test set A

n2 = the number of instances in test set B

)2/11/1)(1(

21

nnqq

EEP

18

7.5 Attribute Evaluation

19

Locating Redundant Attributes with Excel

• Correlation Coefficient

• Positive Correlation• Negative Correlation• Curvilinear Relationship (curve line)

–Two attributes having a low r value may still have a curvilinear

20

Positive Correlation r=1

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 9 10

Attribute A

Att

rib

ute

B

21

Negative Correlation r=-1

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Attribute A

Att

rib

ute

B

22

Curvilinear Relationship r=0

0

5

10

15

20

25

30

0 2 4 6 8 10 12

Attribute A

Att

rib

ute

B

23

Creating a Scatterplot Diagram with MS Excel

24

Blood Pressure vs. Cholesterol

0

50

100

150

200

250

300

350

400

450

0 20 40 60 80 100 120 140 160 180 200

Blood Pressure

Ch

ole

ste

rol

25

Hypothesis Testing for Numerical Attribute Significance

jjii

ji

ininstancesofnumber theisand in instancesofnumber theis

. attributefor variancej class theand variancei class the

.attributeformeanjclass theis andmeaniclass theis i

where

CC

Aisis

Aj

XX

nn

vv

)//( jnjviniv

jXiX

ijP

26

Table 7.2 • Cardiology Patient Data: Numerical Attribute Significance

Class Class ESX Attribute Hypothesis Test Sick Healthy Significance for Significance

Age (Mean) 56.50 52.50 0.45 4.076 (Sd) 7.96 9.55

BP (Mean) 134.40 129.30 0.29 2.511 (Sd) 18.73 16.17

Chol (Mean) 251.09 242.23 0.17 1.495 (Sd) 49.46 53.55

MHR (Mean) 139.10 158.47 0.85 7.955 (Sd) 22.60 19.1

Peak (Mean) 1.59 0.58 0.86 8.001 (Sd) 1.30 0.78

27

7.6 Unsupervised Evaluation Techniques

• Unsupervised Clustering for Supervised Evaluation– If the instances cluster into the predefined classes contained in the training data, a supervised learner model built with the training data is likely to perform well.

• Supervised Evaluation for Unsupervised Clustering–Designate each formed cluster as a class–Build a supervised learner model by choosing a random sampling of instances from each class–Test the supervised learner model with the remaining instances

• Additional Methods

28

Additional Methods

• Designate all instances as training data

• Apply an alternative technique’s measure of cluster quality

• Create your own measure of cluster quality

• Perform a between-cluster attribute-value comparison.

29

7.7 Evaluating Supervised Models with Numeric Output

30

Mean Squared Error

where for the ith instance,

ai = actual output value

ci = computed output value

n

cacacacamse

2) ( ... )(... 2) ( 2) ( nni i2211

31

Mean Absolute Error

where for the ith instance,

ai = actual output value

ci = computed output value

n

cacacamae

| | .... | | | | nn2211

32

Table 7.3 • Absolute and Squared Error

Instance Life Ins. Promo. Computed Absolute SquaredNumber Actual Output Output Error Error

1 0.0 0.024 0.024 0.00052 1.0 0.998 0.002 0.00003 0.0 0.023 0.023 0.00054 1.0 0.986 0.014 0.00025 1.0 0.999 0.001 0.00006 0.0 0.050 0.050 0.00257 1.0 0.999 0.001 0.00008 0.0 0.262 0.262 0.06869 0.0 0.060 0.060 0.003610 1.0 0.997 0.003 0.000011 1.0 0.999 0.001 0.000012 1.0 0.776 0.224 0.050213 1.0 0.999 0.001 0.000014 0.0 0.023 0.023 0.000515 1.0 0.999 0.001 0.0000

1 formal evaluation techniques chapter 7. 2 test set error rates, confusion matrices, lift charts...

Documents