zhangxi lin isqs 7342-001 texas tech university note: most slides in this file are sourced from sas@...

77
Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple Target Prediction

Upload: morgan-harvey

Post on 25-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

Zhangxi LinISQS 7342-001Texas Tech UniversityNote: Most slides in this file are sourced from SAS@ Course Notes

Lecture Notes 8Continuous and Multiple Target Prediction

Page 2: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

2

Structure of the Chapter

Section 2.1 raises the problem that the normal decision tree methods did not turn out good results

Section 2.2 analyzes the problem

Section 2.3 develops basic two-stage models to improve the results

Section 2.4 further improves the two-stage models

Page 3: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

Section 2.1

Introduction

Page 4: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

4

Motivation

The results of the 1998 KDD-Cup produced a surprise. Almost half of the entrees yielded a total profit on the validation data that was less than that obtained by soliciting everyone.

Part of the problem lies in the method used to select cases for solicitation. This chapter extends the notion of profit introduced in Chapter 1 to allow for better selection of cases for solicitation.

Page 5: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

5

1998 KDD-Cup Results

1.2.3.4.5.6.7.8.9.

10.

$14,71214,66213,95413,82513,79413,59813,04012,29811,42311,276

TotalProfitRank

$0.1530.1520.1450.1430.1430.1410.1350.1280.1190.117

OverallAvg. Profit

11.12.13.14.15.16.17.18.19.20.

$ 10,72010,70610,11210,0499,7419,4645,6835,4841,9251,706

TotalProfitRank

$ 0.1110.1110.1050.1040.1010.0980.0590.0570.0200.018

OverallAvg. Profit

$10,560$ 0.110

Total profitAvg. profitfor “solicit everyone”

model

Page 6: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

Section 2.2

Generalized Profit Matrices

Page 7: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

7

Random Profit Consequences

Profit Profit00 Profit0Profit0

Primary Decision Secondary Decision

Negative profit

Page 8: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

8

Outcome Conditioned Random Profits

In a more general context, the profit associated with a decision for an individual case can be thought of as a random variable. The goal of predictive modeling is to estimate the distribution of this profit random variable conditioned on case input measurements.

Because the decisions are usually associated with discrete outcomes, the random profits are conditioned on each of these outcomes. For a binary outcome and two decisions, the random profits form the elements of a 2x2 random matrix.

Page 9: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

9

Outcome Conditioned Random Profits

Primary Decision Secondary Decision

Profit Profit00

Profit0Profit0

PrimaryOutcome

SecondaryOutcome

0

Negative profit

Page 10: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

10

Expected Profit Matrix

Profit Profit00

Profit0Profit0 E( ) E( )

E( )E( )

Primary Decision Secondary Decision

PrimaryOutcome

SecondaryOutcome

Negative profit

Page 11: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

11

Expected/Reduced Profit Matrix

Because it is easier to work with concrete numbers than random variables, statistical summaries of the random profit matrices are used to quantify the consequence of a decision.

One way to do this is to calculate the expected value of the profit random variable for each outcome and decision combination. Arrayed as a matrix, this is called the expected profit-consequence matrix, or the expected profit matrix for a case.

Often, generalized profit matrices have zeros in the secondary decision column. Without loss of generality (assuming the profit-consequence is measured by expected value), it is always possible to write the generalized profit matrix with a column of zero profits

Page 12: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

12

Reduced Profit Matrix

Profit Profit00

Profit0Profit0 E( ) E( )

E( )E( )

Primary Decision Secondary Decision

PrimaryOutcome

SecondaryOutcome

Negative profit

The difference

Page 13: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

13

Reduced Profit Matrix

Profit0

Profit0 E( )

E( )

Primary Decision

PrimaryOutcome

SecondaryOutcome

Profit0

Profit0 E( )

E( )

Secondary Decision

Negative profit

The difference

Page 14: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

14

Expected Profit-Consequence

0

0

Primary Decision

PrimaryOutcome

SecondaryOutcome

ExpectedProfit-Consequence

EPF

EPF

p

p

+∙ ∙EPF p EPF p∙ + ∙EPC =

Negative profit

Page 15: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

15

Expected Profit-Consequence

0

0

Primary Decision

PrimaryOutcome

SecondaryOutcome

ExpectedProfit-Consequence

EPF

EPF

p

p

EPC

EPC EPF p EPF p∙ + ∙

EPF p EPF p∙ + ∙=

=

Negative profit

Page 16: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

16

Expected Profit-Consequence

0

0

Primary Decision

PrimaryOutcome

SecondaryOutcome

ExpectedProfit-Consequence

EPF

EPF

p

p

EPC

EPC

EPC EPF p EPF p∙ + ∙

EPF p EPF p∙ + ∙=

=

Negative profit

Page 17: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

17

Expected Profit-Consequence

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC 0

0

Primary Decision

PrimaryOutcome

SecondaryOutcome

Negative profit

Page 18: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

18

Sort Expected Profit-Consequence

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Sort cases by decreasing EPC.

Page 19: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

19

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Total Expected Profit

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

Sum EPCs inexcess of threshold.

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC ≥

Page 20: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

20

EPC

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Total Expected Profit

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

Sum EPCs inexcess of threshold.

Page 21: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

21

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Page 22: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

22

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Page 23: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

23

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Negative profit

Page 24: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

24

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPCProfit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Page 25: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

25

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Negative profit

Page 26: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

26

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Negative profit

Page 27: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

27

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Page 28: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

28

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Profit

EPC

EPC

EPC

Profit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Negative profit

Page 29: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

29

EPC

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

Observed Profit

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

OP

OP

OP

OP

OP

OP

OP

OP

OP

Record observedprofits.

Page 30: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

30

OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Observed Total Profit

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

OP

OP

OP

OP

OP

OP

OP

OP

OP

Sum OPs for cases with EPCs in excess

of threshold.

OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPC

Page 31: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

31

OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Generalized Profit Assessment Data

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

OP

OP

OP

OP

OP

OP

OP

OP

OP

Sum OPs for cases with EPCs in excess

of threshold.

OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

EPC

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

Page 32: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

32

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPCEPC OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

Total Profit Plot

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

OP

OP

OP

OP

OP

OP

OP

OP

OP OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

EPC

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

Depth

Page 33: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

33

Observed and Expected Profit Plot

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

EPCEPC OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

OP

OP

OP

OP

OP

OP

OP

OP

OP

EPC

EPC

EPC

EPC

EPC

EPC

EPC

EPC

OP

OP ≥

OP ≥

OP≥

OP≥

OP

OP ≥

OP ≥

OP≥

EPC

EPC ≥

EPC ≥

EPC≥

EPC≥

EPC ≥

EPC ≥

EPC ≥

EPC≥

Depth

Page 34: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

34

Profit Confusion Matrix

Primary Decision

PrimaryOutcome

SecondaryOutcome

OP

OP

true positive profit

false positive profit

total primary profit

total secondary profit

Secondary Decision

OP

OP

false negative profit

true negative profit

OP

OP

total primary decision profit

OP total secondarydecision profit

OP

Page 35: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

35

True Positive Profit Fraction

Primary Decision

PrimaryOutcome

SecondaryOutcome

OP

OP

true positive profit

false positive profit

total primary profit

total secondary profit

Secondary Decision

OP

OP

false negative profit

true negative profit

OP

OP

total primary decision profit

OP total secondarydecision profit

OP

Page 36: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

36

False Positive Profit Fraction

Primary Decision

PrimaryOutcome

OP true positive profit total primary

profit

Secondary Decision

OP

OP

false negative profit

true negative profit

OP

total primary decision profit

OP total secondarydecision profit

OP

SecondaryOutcome

OP false positive profit total secondary

profit

OP

Page 37: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

Section 2.3

Basic Two-Stage Models

Page 38: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

38

Defining Two-Stage Model Components

E(B|X)E(D|X)

15.30X Specified values

Separate predictive models

Joint predictive modelsE(B,D|X)

Page 39: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

39

Two-Stage Modeling Methods

A better estimate of the primary decision profit can be obtained by modeling both outcome probability and expected profit, using two-stage modeling methods.

The two ways to estimate the components used in two-stage models. The first is to simply specify values for certain components. This is

simple to do, but it often produces poor results. In a more sophisticated approach, you can use the value in an input

or a look-up table as a surrogate for expected donation amount. The most common approach is to estimate values for components with

individual models. At the extreme end of the sophistication scale, you can use a single model

to predict both components simultaneously, for example, the NEURAL procedure in SAS Enterprise Miner.

Page 40: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

40

Basic Two-Stage Models

Two-stage model collapses two models: - One to estimate the donation propensity;- Another one to estimate the donation amount.

Page 41: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

41

Two-Stage Model Tool

The Two Stage Model tool builds two models, one to predict TARGET_B and one to predict TARGET_D. Theoretically, you can use this node to combine predictions for the two target variables and get a prediction of expected donation amount.

The tool has two minor limitations: It does not recognize the prior vector. Thus, because

responders are overrepresented in the training data, the probabilities in the TARGET_B model are biased.

The node has no built-in diagnostic to assess overall average profit. Profit information passed to the Assessment node is incorrect.

Both of these limitations are easily overcome by the Generalized Profit Assessment tool.

Page 42: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

42

The Model We Are Using

Basic model

Different from the book

Page 43: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

43

Target Variables

Page 44: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

44

Some Two-Stage Model Options

Model fitting approach: sequential, or concurrent Sequential: couples model by making the binary outcome

model’s prediction an input for the expected profit model Concurrent: fits a neural network model with two target

FILTER: removes cases from the training data when building the value model

MULTIPLY: multiplies the class and value model predictions

Page 45: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

45

Results of the Two-Stage Node

Page 46: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

46

Results of the GPA Node

Oddities in the assessment report.

1. The reported overall average profits from training data are extremely low.

2. The depth supposedly corresponding to optimum profit threshold is reported to be 100% (select all cases).

3. The total profit reported in the validation data is almost 40% higher than in the training data.

Page 47: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

47

Stratification with BIN_TARGET_D

Page 48: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

48

Improved Results of the GPA Node

The third problem has been solved.

But the performance of the model is still lower than that from “no model”.

Page 49: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

49

Correct bias in GPA by setting the following parameter in the code:

%let EM_PROPERTY_adjustprobs = Y;

The model is no longer selecting all the data (it is now around 60%), but the overall average profit values remain low.

The average profit = 0.1105. It is slightly more than that without using a model.

Page 50: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

50

Results from an Improved Two-Stage Model

Parameters:Class Model: Regression

Selection Model: Stepwise

Selection Criteria: Validation Error

The Average Profit: 0.155

This result is good enough to win the KDD Cup!

Page 51: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

51

Summary – Improving the Performance

Section 2.3 Use two-stage models Stratification using the binned value target Correct bias in GPA: %let EM_PROPERTY_adjustprobs = Y;

Section 2.4 Use regression settings in the Two Stage node Reduce MSE: Interval target value transformation Construct the component models separately from the Two Stage node.

Use regression trees in a two-stage model(%let EM_PROPERTY_adjustprobs = N;)

Use neural networks in a two-stage model

(%let EM_PROPERTY_adjustprobs = N;)

Page 52: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

Section 2.4

Constructing Component Models

Page 53: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

53

Two-Stage Modeling Challenges

Model Assessment

Interval Model SpecificationE(D) = g(x;w)

Page 54: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

54

Two-Stage Modeling Challenges

Constructing two-stage (or more generally, any multiple component model) requires attention to several challenges not previously encountered.

Earlier modeling assessment efforts evaluated models based on profitability measures, assuming a fixed profit structure. Because the profit structure itself is being modeled in a two-stage model, you need a different mechanism to assess model performance.

Correct specification requires appropriately chosen inputs, link functions, and target error distribution.

By incorporating the predictions of the binary model into the interval mode, it can be possible to make a more parsimonious specification of the interval model.

Page 55: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

55

Estimating Mean Squared Error

X

D

Training Data

(Di - Di )2^

i = 1

N1NEstimated MSE =

MSE

E[(D-D)2]^

Page 56: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

56

MSE Decomposition: Variance

X

D

Training Data

Variance

(Di - Di )2^

i = 1

N1NEstimated MSE =

MSE

E[(D-D)2] = E[(D-ED)2] + [E(D-ED)]2^^

In theory, the MSE can be decomposed into two components, each involving adeviance from the true expected value of the target variable.

Page 57: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

57

MSE Decomposition: Squared Bias

X

D

Training Data

Bias2

(Di - Di )2^

i = 1

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2^

Variance - independent of any fitted model.Bias2 - the difference between the predicted and actual expected value

Page 58: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

58

Honest MSE Estimation

X

D

Validation Data

(Di - Di )2^

i = 1

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2

Bias2

^

Unbiased estimates can be obtained by correctly accounting for model degrees of freedom in the MSE estimate or simplyestimating MSE from an independent validation data set.

Page 59: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

59

Honest MSE Estimation

X

D

Validation Data

(Di - Di )2^

i = 1

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2

Bias2

^

Page 60: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

60

Honest MSE Estimation

X

D

Validation Data

(Di - Di )2^

i = 1

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2

Bias2

^

Page 61: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

61

InseparabilityB̂

MSE and Binary Target Models

X

B

Validation Data

(Bi - Bi )2^

i = 1

N1NEstimated MSE =

Inaccuracy

E[(B-B)2] ^ = E[(B-EB)2] + [E(B-EB)]2

Imprecision

VarianceMSE Bias2

^

Page 62: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

62

The Binary Target

The estimated MSE of the binary target can be thought of as measuring the overall inaccuracy of model prediction.

This inaccuracy estimate can be decomposed into a term related to the inseparability of the two-target levels (corresponding to the variance component) plus a term related to the imprecision of the model estimate (corresponding to the bias-squared component).

In this way, the model with the smallest estimated MSE will also be the least imprecise.

Page 63: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

63

Two-Stage Modeling Challenges

Model Assessment

Interval Model SpecificationE(D) = g(x;w)

Use Validation MSE

To assess both the binary and the interval component models, it is reasonable to compare their validation data mean squared error. Models with the smallest MSE will have the smallest bias or imprecision.

Page 64: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

64

Two-Stage Modeling Challenges

Model Assessment

Interval Model SpecificationE(D) = g(x;w)

Use Validation MSE

A standard regression model may be ill suited for accurately modeling the relationship between the inputs and TARGET_D.

Matching the structure of the model to the specific modeling requirements is vital to obtaining good predictions.

Page 65: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

65

Interval Model Requirements

Correct Error Distribution

Good Inputsx1 x3 x10

E(D) > 0 Positive Predictions

Adequate Flexibility

Page 66: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

66

Making Positive Predictions

log(E(Y |X ))

E( log(Y) | X ) Transform target.

Define appropriate link.

Hints:

The interval component of a two-stage model is often used to predict a monetaryresponse. Random variables that represent monetary amounts usually assume askewed distribution with positive range and a variance related to expected value.When the target variable represents a monetary amount, this limited range and skewness in the model specification must be considered.

Proper specification of the target range and error distribution increases the chances of selecting good inputs for the interval target model. With good inputs, the correct degree of flexibility can be incorporated into the model and predictions can be optimized.

Page 67: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

67

Error Distribution Requirements

Possess correct skewness.

Have conforming support.

Account for heteroscedasticity.

Y

Page 68: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

68

Specifying the Correct Error Distribution

Normal (truncated)constant*

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

Distribution Variance

The normal distribution has a range from negative to positive infinity,whereas the target variable may have a more restricted range.

Page 69: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

69

Specifying the Correct Error Distribution

Normal (truncated)constant*

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

Distribution Variance

One disadvantage of the Poisson distribution relates to its skewness properties.Poisson error distributions are limited to the Neural Network node.

Page 70: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

70

Specifying the Correct Error Distribution

Normal (truncated)constant*

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

Distribution Variance

The gamma distribution is limited to the neural network node. The lognormaldistribution can be used with any modeling tool.

Page 71: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

71

Specifying the Correct Error Distribution

Normal (truncated)constant*

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

Distribution Variance

100x

A few extreme outliers may indicate a lognormal distribution, whereas the absence of such may imply a gamma or less extreme distribution.

Page 72: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

72

Two-Stage Modeling Challenges

Model Assessment

Interval Model SpecificationE(D) = g(x;w)

log(Target) / Specify Link and Error

Use Validation MSE

Page 73: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

73

Interval Target Model

Page 74: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

74

The Parameters and Results

Page 75: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

75

Compare the Distributions of Residuals

Use Log-transformed Target_D Using original Target_D

Page 76: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

76

Using Regression Trees

Page 77: Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes Lecture Notes 8 Continuous and Multiple

77

Using Neural Network Models