chapter 5: introduction to predictive modeling: neural networks and other modeling tools

1

Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools

5.1 Introduction

5.2 Input Selection

5.3 Stopped Training

5.4 Other Modeling Tools (Self-Study)

2


5.1 Introduction 5.1 Introduction

5.2 Input Selection



3

Model Essentials – Neural Networks

Predict new cases.

Select useful inputs.

Optimize complexity.

...

4


Stoppedtraining

None

Predict new cases.

Select useful inputs

Optimize complexity



...

5


Stoppedtraining

None

Predict new cases.



...

6

Neural Network Prediction Formula

predictionestimate

weightestimate

hidden unit

biasestimate

0

1

5-5

-1

tanh

...

activationfunction

...

8

Neural Network Binary Prediction Formula

0

1

5-5

-1

tanh

0 1

5

-5

logitlink function

...

9

Neural Network Diagram

y

targetlayer

H1

H2

H3

hiddenlayer

x2

inputlayer

x1

...

10

Neural Network Diagram

y

targetlayer

H1

H2

H3

hiddenlayer

x2

inputlayer

x1

...

11

Prediction Illustration – Neural Networks

...

logit equation

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

12


...

logit equation

Need weight estimates.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

13


...

logit equation

Log-likelihood Function

Weight estimates found by maximizing:

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

14


...

logit equation 0.70

0.60

0.50

0.40

0.40

0.60

0.50

0.50

0.60

0.30

Probability estimates are obtained by solving the logit equation for p for each (x1, x2).^

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

15

Neural Nets: Beyond the Prediction Formula

Interpret the modelInterpret the model.

Handle extreme or unusual values

Use non-numeric inputs

Account for nonlinearities

Manage missing values.

Handle extreme or unusual values.

Use non-numeric inputs.

Account for nonlinearities.

...

17

Training a Neural Network

This demonstration illustrates using the Neural Network tool.

18


5.1 Introduction

5.2 Input Selection5.2 Input Selection



19


Predictionformula

Best modelfrom sequence

Sequentialselection

Predict new cases.

Select useful inputs



21

5.01 Multiple Answer PollWhich of the following are true about neural networks in SAS Enterprise Miner?

a. Neural networks are universal approximators.

b. Neural networks have no internal, automated process for selecting useful inputs.

c. Neural networks are easy to interpret and thus are very useful in highly regulated industries.

d. Neural networks cannot model nonlinear relationships.

22

5.01 Multiple Answer Poll – Correct AnswersWhich of the following are true about neural networks in SAS Enterprise Miner?

a. Neural networks are universal approximators.

b. Neural networks have no internal, automated process for selecting useful inputs.

c. Neural networks are easy to interpret and thus are very useful in highly regulated industries.

d. Neural networks cannot model nonlinear relationships.

23

Selecting Neural Network Inputs

This demonstration illustrates how to use a logistic regression to select inputs for a neural network.

24


5.1 Introduction

5.2 Input Selection

5.3 Stopped Training5.3 Stopped Training


25


Predict new cases.



Predictionformula

Sequentialselection

...

26

Fit Statistic versus Optimization Iteration

^logit(ρ1)logit( p ) = ^

H1 = tanh(-1.5 - .03x1 - .07x2)

H2 = tanh( .79 - .17x1 - .16x2)

H3 = tanh( .57 + .05x1 +.35x2 )

logit(0.5)0

initial hidden unit weights

+ 0·H1 + 0·H2 + 0·H3

...

27


H1 = tanh(-1.5 - .03x1 - .07x2)

H2 = tanh( .79 - .17x1 - .16x2)

H3 = tanh( .57 + .05x1 +.35x2 )

H1 = tanh(-1.5 - .03x1 - .07x2)

H2 = tanh( .79 - .17x1 - .16x2)

H3 = tanh( .57 + .05x1 +.35x2 )

logit( p ) = ^ 0 + 0·H1 + 0·H2 + 0·H3

random initial input weights and biases

...

28


H1 = tanh(-1.5 - .03x1 - .07x2)

H2 = tanh( .79 - .17x1 - .16x2)

H3 = tanh( .57 + .05x1 +.35x2 )

H1 = tanh(-1.5 - .03x1 - .07x2)

H2 = tanh( .79 - .17x1 - .16x2)

H3 = tanh( .57 + .05x1 +.35x2 )

logit( p ) = ^ 0 + 0·H1 + 0·H2 + 0·H3

random initial input weights and biases

...

29


0 5 15 20Iteration10

...

30


0 5 15 20

validationtraining

ASE

Iteration1 10

...

31


0 5 15 20

validationtraining

ASE

Iteration2 10

...

32


0 5 15 20

validationtraining

ASE

Iteration3 10

...

33


0 5 15 20

validationtraining

ASE

Iteration4 10

...

34


0 5 15 20

validationtraining

ASE

Iteration10

...

35


0 5 15 20

validationtraining

ASE

Iteration6 10

...

36


0 5 15 20

validationtraining

ASE

Iteration7 10

...

37


0 5 10 15 20

validationtraining

ASE

Iteration8

...

38


0 5 10 15 20

validationtraining

ASE

Iteration9

...

39


0 5 15 20

validationtraining

ASE

Iteration10

...

40


0 5 15 20

validationtraining

ASE

Iteration1011

...

41


0 5 15 20

validationtraining

ASE

Iteration10 12

...

42


0 5 15 20

validationtraining

ASE

Iteration10 13

...

43


0 5 15 20

validationtraining

ASE

Iteration10 14

...

44


0 5 15 20

validationtraining

ASE

Iteration10

...

45


0 5 20

validationtraining

ASE

Iteration1510 16

...

46


0 5 20

validationtraining

ASE

Iteration1510 17

...

47


0 5 20

validationtraining

ASE

Iteration1510 18

...

48


0 5 20

validationtraining

ASE

Iteration1510 19

...

49


0 5

validationtraining

ASE

Iteration201510

...

50


0 5

validationtraining

ASE

Iteration201510 21

...

51


0 5

validationtraining

ASE

Iteration201510 22

...

52


0 5

validationtraining

ASE

Iteration201510 23

...

53


ASE

Iteration

0.70

0.60

0.50

0.40

0.40

0.60

0.50

0.50

0.60

0.30

0 5 15 2010 12

...

54

Increasing Network Flexibility

This demonstration illustrates how to further improve neural network performance.

55

Using the AutoNeural Tool (Self-Study)

This demonstration illustrates how to use the AutoNeural tool.

56


5.1 Introduction

5.2 Input Selection


5.4 Other Modeling Tools (Self-Study)5.4 Other Modeling Tools (Self-Study)

57

Model Essentials – Rule Induction

Predict new cases.



58

Rule Induction Predictions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.74

0.39

[Rips create prediction rules.]

A binary model sequentially classifies and removes correctly classified cases.

[A neural network predicts remaining cases.]

59

Model Essentials – Dmine Regression

Predict new cases.



60

Dmine Regression Predictions Interval inputs binned,

categorical inputs grouped

Forward selection picks from binned and original inputs

61

Model Essentials – DMNeural

Predict new cases.



62

DMNeural Predictions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0 Up to three PCs with highest target R square are selected.

One of eight continuous transformations are selected and applied to selected PCs.

The process is repeated three times with residuals from each stage.

63

Model Essentials – Least Angle Regression

Predict new cases.



64

Least Angle Regression Predictions

1.0

Inputs are selected using a generalization of forward selection.

An input combination in the sequence with optimal, penalized validation assessment is selected by default.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

65

Model Essentials – MBR

Predict new cases.



66

MBR Prediction Estimates

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

Sixteen nearest training data cases predict the target for each point in the input space.

Scoring requires training data and the PMBR procedure.

67

Model Essentials – Partial Least Squares

Predict new cases.



68

Partial Least Squares Predictions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0 Input combinations (factors) that optimally account for both predictor and response variation are successively selected.

Factor count with a minimum validation PRESS statistic is selected.

Inputs with small VIP are rejected for subsequent diagram nodes.

69

Exercises

This exercise reinforces the concepts discussed previously.

70

Neural Network Tool ReviewCreate a multi-layer perceptron on selected inputs. Control complexity with stopped training and hidden unit count.

chapter 5: introduction to predictive modeling: neural networks and other modeling tools

Documents