1 chapter 1: stratified data analysis 1.1 introduction 1.2 examining associations among variables...

118
1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic Regression

Upload: alexander-mcdaniel

Post on 04-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1

Chapter 1: Stratified Data Analysis

1.1 Introduction

1.2 Examining Associations among Variables

1.3 Recursive Partitioning

1.4 Introduction to Logistic Regression

Page 2: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

2

Chapter 1: Stratified Data Analysis

1.1 Introduction1.1 Introduction

1.2 Examining Associations among Variables

1.3 Recursive Partitioning

1.4 Introduction to Logistic Regression

Page 3: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Objectives Recognize the differences between categorical

and continuous data analysis. Identify the scale of measurement for your

response variable.

3

Page 4: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Categorical versus Continuous Data Analysis

4

Page 5: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Identifying the Scale of Measurement

Before analyzing, select the measurement scale for each variable.

5

VARIABLE

AGREE

NO OPINION

DISAGREE

Page 6: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Nominal VariablesVariable: Type of Beverage

or

6

1 2 3

1 2 3

Page 7: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Ordinal Variables

7

Variable: Size of Beverage

Small Medium Large

Page 8: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Continuous Variables

8

0

1.0

3.0

2.0

Variable: Volume of Beverage

4.0

Page 9: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 10: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.01 QuizA car dealer records several inventory variables, including Type (automatic or standard), Time (the number of seconds it takes for the car to go from 0 to 60 mph), and Model (basic, middle, or luxury).

Match the modeling type on the left with the appropriate component on the right.

1. Continuous A. Type

2. Ordinal B. Time

3. Nominal C. Model

10

Page 11: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.01 Quiz – Correct AnswerA car dealer records several inventory variables, including Type (automatic or standard), Time (the number of seconds it takes for the car to go from 0 to 60 mph), and Model (basic, middle, or luxury).

Match the modeling type on the left with the appropriate component on the right.

1. Continuous A. Type

2. Ordinal B. Time

3. Nominal C. Model

11

1-B, 2-C, 3-A

Page 12: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

What’s Next?

12

Ah ha!Ordinal! Agree

No Opinion

Disagree

opinion

Page 13: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

13

Page 14: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

14

Chapter 1: Stratified Data Analysis

1.1 Introduction

1.2 Examining Associations among Variables1.2 Examining Associations among Variables

1.3 Recursive Partitioning

1.4 Introduction to Logistic Regression

Page 15: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Objectives Examine the distribution of categorical variables. Determine whether an association exists among

categorical variables. Perform a stratified analysis of categorical variables.

15

Page 16: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Sample Data Set

16

Page 17: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

17

This demonstration illustrates the concepts discussed previously.

Examining Distributions

Page 18: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Association An association exists between two variables if the

distribution of one variable changes when the level (or values) of the other variable changes.

If there is no association, the distribution of the first variable is the same, regardless of the level of the other variable.

18

Page 19: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

No Association

19

72% 28%

28%72%

Is your manager’s mood associatedwith the weather?

Page 20: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Association

20

82% 18%

40%60%

Is your manager’s mood associatedwith the weather?

Page 21: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

21

This demonstration illustrates the concepts discussed previously.

Recognizing Associations

Page 22: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 23: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.02 QuizIs there an association between finishing a prescription (Rx) and experiencing a relapse?

23

Page 24: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.02 Quiz – Correct AnswerIs there an association between finishing a prescription (Rx) and experiencing a relapse?

Yes. The distribution of Yes/No for Did not finish Rx is different from the distribution of Yes/No for Finished Rx.

24

Page 25: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tests for Association

25

Row percents of Income by Purchase

$100 + Under $100

Low 32% 68%

Medium 32% 68%

High 48% 52%

Purchase

Income

Page 26: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Null Hypothesis There is no association between Income and

Purchase. The probability of purchasing items of $100 or more

is the same, regardless of income level.

26

Page 27: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Alternative Hypothesis There is an association between Income and

Purchase. The probability of purchasing items over $100 is different

between Low, Medium, and High income customers.

27

Page 28: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Chi-Square Test

28

NO ASSOCIATIONobserved frequencies = expected frequencies

ASSOCIATIONobserved frequencies = expected frequencies\

Page 29: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

p-Value for Chi-Square TestThis p-value is the probability of observing a chi-square statistic at least

as large as the one actually observed, given that there is no association between the variables

probability of the association you observe in the data occurring by chance.

29

Page 30: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Chi-Square TestsChi-square tests and the corresponding p-values determine whether an association exists do not measure the strength of an association depend on and reflect the sample size.

30

Page 31: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

31

This demonstration illustrates the concepts discussed previously.

Chi-Square Test

Page 32: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 33: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.03 QuizIs there sufficient evidence that an association exists between Relapsed and Rx Status?

33

Page 34: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.03 Quiz – Correct AnswerIs there sufficient evidence that an association exists between Relapsed and Rx Status?

Yes there is sufficient evidence that an association exists between Relapsed and Rx Status. The p-value for the Pearson chi-square statistic is .0005, so at alpha=.05, there is sufficient evidence to reject the null (that no association exists) in favor of the alternative (that an association exists).

34

Page 35: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

When Not to Use the Chi-Square Test

35

When more than 20% of the cellshave expected counts less than five

2

Expected

Page 36: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Observed versus Expected Values

36

3.43 4.57 6.00

4.41 5.88 7.71

4.16 5.55 7.29

Observed Values Expected Values

1 5 8

5 6 7

6 5 6

Page 37: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Small Samples – Fisher’s Exact Test

37

Fisher’sExactTest

SAMPLE SIZE

Small

Large

Page 38: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Example: Tea and MilkSuppose you want to test whether someone can determine if a cup of tea with milk had the milk poured first or the tea poured first.

38

Page 39: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Fisher’s Exact Test Example9 Cups of Tea: 4 with Milk First and 5 with Tea First

Predict which cups had tea poured first.

39

4

5

4 5

M

T

M T

FixedMarginalTotalsA

ctu

al

Guess

Page 40: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Basis for Fisher’s Exact Test

40

0

4

4

1

4

4

5

5

2

2

2

3

4

4

5

5

3

1

1

4

4

4

5

5

row and columntotals fixed

Otherpossibilities

M

M

T

T

3 4

5

4 5

0

0 5

4

Ac

tua

l

Guess

1

3

3

2

4

4

5

5

Page 41: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Fisher’s Exact Test HypothesesNull Hypothesis: There is no association.

Alternative Hypothesis: There is an association. Two-tailed Left-tailed Right-tailed

41

Page 42: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Left-Tailed Alternative Hypothesis

42

0

4

4

1

4

4

5

5

Left-tailed p-value

M

1

3

3

2

4

4

5

5

M

T

T

Ac

tua

l

Guess

Page 43: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Right-Tailed Alternative Hypothesis

43

Right-tailed p-value

M

1

3

3

2

4

4

5

5

M

T

T 2

2

2

3

4

4

5

5

3

1

1

4

4

4

5

5

4 0

0 5

4

4 5

5

Ac

tua

l

Guess

Page 44: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Two-Tailed Alternative Hypothesis

44

0

4

4

1

4

4

5

5

Two-tailed p-value

M

1

3

3

2

4

4

5

5

M

T

T 2

2

2

3

4

4

5

5

3

1

1

4

4

4

5

5

4

4

5

5

4 0

0 5A

ctu

al

Guess

Page 45: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

45

This demonstration illustrates the concepts discussed previously.

Fisher’s Exact Test

Page 46: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 47: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.04 QuizWhat can you conclude from each of the p-values from the Fisher’s Exact Test for the association between Relapsed and Rx Status?

47

Page 48: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.04 Quiz – Correct AnswerWhat can you conclude from each of the p-values from the Fisher’s Exact Test for the association between Relapsed and Rx Status?

The Left p-value = .0007, so there is sufficient evidence to conclude that the probability of a relapse is greater for those who did not finish the Rx than for those who did.

The Right p-value = .9999, so there is not sufficient evidence to conclude that the probability of a relapse is greater for those who finished the Rx than for those who did not.

The 2-Tail p-value = .0008, so there is sufficient evidence to conclude that the probability of a relapse is different depending on whether a Rx was finished or not.

48

Page 49: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

What Happens If There Is a Third Variable?

49

Income

Gender

$100

Page 50: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Stratified Data Analysis Stratified data analysis is the process of dividing

subjects into groups defined by the levels of a third variable.

Use this analysis when you want to examine the association between two variables within the levels of a third variable.

50

Page 51: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Stratified Data Analysis

Of the 39 single people, 23% have lung cancer and 77% do not. Of the 36 married people, 17% have lung cancer and 83% do not.

51

Page 52: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Stratified Data Analysis

Of the 28 single smokers, 28% have lung cancer and 72% do not. Of the 14 married smokers, 28% have lung cancer and 72% do not.

52

Page 53: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Cochran-Mantel-Haenszel Statistics

53

Page 54: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

CMH versus Chi-Square

54

Page 55: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1. Correlation of Scores

55

B

A

Test linear association

Page 56: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

2. Row Scores by Column Categories

56

B

A

Test equal row scores

Page 57: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

3. Column Scores by Row Categories

57

B

A

Test equal column scores

Page 58: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

4. General Association of Categories

58

B

A

22

Test general association

Page 59: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

CMH Statistics and 2x2 Tables

59

2 X 2CMH

statisticsare all equal

Page 60: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

When Do CMH Statistics Lack Power?

60

Response Reversed in Strata

Page 61: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

61

This demonstration illustrates the concepts discussed previously.

CMH Tests

Page 62: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

62

Page 63: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

63

Exercise

This exercise reinforces the concepts discussed previously.

Page 64: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 65: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.05 Multiple Choice PollThe Correlation of Scores CMH test has which null hypothesis?

a. There is no linear association between the row and column variables in any stratum.

b. The mean scores for each column are equal in each stratum.

c. The mean scores for each row are equal in each stratum.

d. There is no association between the row and column variables in any stratum.

65

Page 66: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.05 Multiple Choice Poll – Correct AnswerThe Correlation of Scores CMH test has which null hypothesis?

a. There is no linear association between the row and column variables in any stratum.

b. The mean scores for each column are equal in each stratum.

c. The mean scores for each row are equal in each stratum.

d. There is no association between the row and column variables in any stratum.

66

Page 67: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

67

Chapter 1: Stratified Data Analysis

1.1 Introduction

1.2 Examining Associations among Variables

1.3 Recursive Partitioning1.3 Recursive Partitioning

1.4 Introduction to Logistic Regression

Page 68: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Objectives Define partitioning. Understand the splitting criteria used in JMP. Review algorithm parameters available in JMP. Use the Partition platform in JMP.

68

Page 69: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Recursive PartitioningPartitioning refers to segmenting the data into groups that are as homogeneous as possible with respect to the dependent variable (Y).

69

Page 70: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Divide and Conquer

70

n = 42 n = 261

size (Large) size (Medium, Small)

What factors affect the country from which cars are purchased?

n =303

Country

Page 71: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tree Algorithm: Calculate Separation of the Response

71

X1

Separation of Response

Page 72: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tree Algorithm: Find Best Split for the Independent Variable

72

X1

Best Split X1

Page 73: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tree Algorithm: Repeat for the Other Independent Variables

73

X2

Separation of Means

Page 74: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tree Algorithm: Compare the Best Splits

74

Best Split X2

Best Split X1

Page 75: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tree Algorithm: Partition with Best Split

75

Page 76: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Tree Algorithm: Repeat within Partitions

76

Page 77: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

77

This demonstration illustrates the concepts discussed previously.

Recursive Partitioning

Page 78: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

78

Page 79: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

79

Exercise

This exercise reinforces the concepts discussed previously.

Page 80: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 81: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.06 QuizIn which leaf, and on what variable, will JMP next split?

81

Page 82: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.06 Quiz – Correct AnswerIn which leaf, and on what variable, will JMP next split?

Of the leaves, the highest LogWorth is for Age (.7313), in the Gender(Female) leaf. This is where JMP will next split.

82

Page 83: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

83

Chapter 1: Stratified Data Analysis

1.1 Introduction

1.2 Examining Associations among Variables

1.3 Recursive Partitioning

1.4 Introduction to Logistic Regression1.4 Introduction to Logistic Regression

Page 84: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Objectives Explain the concepts of logistic regression. Fit a logistic regression model using JMP software. Examine logistic regression output.

84

Page 85: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Overview

85

Categorical or

Continuous

Continuous

Categorical

Linear Regression

Analysis

Logistic Regression

Analysis

Predictor Response Analysis

Page 86: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Types of Logistic Regression

86

Nominal

Ordinal

BinaryTwo

Categories

Threeor More

Categories

Response VariableType of

Logistic Regression

Binary

Nominal

Ordinal

Yes No

Page 87: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

What Does Logistic Regression Do? The logistic regression model uses the predictor

variables, which can be categorical or continuous, to predict the probability of specific outcomes.

In other words, logistic regression is designed to describe probabilities associated with the values of the response variable.

87

Page 88: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

The Logistic Curve The relationship between the probability of a response

variable and a predictor variable might be an S-shaped curve.

Linear regression cannot model this relationship, but logistic regression can.

88

Page 89: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Logistic Regression Curves

This graph shows the relationship between the probability of Sale to

Price.

89

Page 90: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Logit Transformation

90

logit( ) logpppi

i

i

1

where

i indexes all cases (observations).

pi is the probability that the event (a sale, for example)occurs in the ith case.

1- pi is the probability that the event (a sale, for example) does not occur in the ith case

log is the natural log (to the base e).

Page 91: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Assumption

91

pi

Predictor Predictor

LogitTransform

Page 92: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Logistic Regression Model

92

logit (pi) = B0 + B1X1

where

logit(pi) is the logit transformation of the probability of the event

B0 is the intercept of the regression line

B1 is the slope of the regression line.

Page 93: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Likelihood Function A likelihood function expresses the probability of the

observed data as a function of the unknown categorical parameters.

The goal is to derive values of the parameters such that the probability of the observed data is as large as possible.

93

Page 94: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Maximum Likelihood Estimate

94

Log-

likel

ihoo

d

Page 95: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Model Inference

95

0

LogL1

LogL0

Log-likelihood

Log-likelihoodfunction

Page 96: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Logistic Curve

96

WeakRelationship

StrongRelationship

Very StrongRelationship

Page 97: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Example of Binary Logistic Regression ModelYou want to predict the probability of defaulting on credit card payments based on having or not having a history of late payments. You can postulate this model:

logit (Probability of Defaulting) = B0 + B1*(Late Payment)

97

Page 98: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

98

This demonstration illustrates the concepts discussed previously.

Binary Logistic Regression

Page 99: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 100: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.07 QuizYou want to predict the probability of a defect, given the width of a product. What kind of association exists between Defect and Width – a strong relationship or a weak relationship?

100

Page 101: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.07 Quiz – Correct AnswerYou want to predict the probability of a defect, given the width of a product. What kind of association exists between Defect and Width – a strong relationship or a weak relationship?

Weak – The fitted regression line is nearly flat, indicating a weak association between Defect and Width.

101

Page 102: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Multiple Logistic Regression

102

Page 103: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Interaction

103

Page 104: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

104

This demonstration illustrates the concepts discussed previously.

Multiple Logistic Regression

Page 105: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

What Is an Odds Ratio?An odds ratio indicates how much more likely, with respect to odds, a certain event occurs in one group relative to its occurrence in another group.

Example: How much more likely are females to purchase 100 dollars or more in items compared to males?

Example: How much more likely is a person with a history of late payments on credit cards to default on a loan relative to a person who does not have a history of late payments?

105

Page 106: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Probability of Outcome

106

Default on Loan

Yes No

Yes Late Payments(Group A)

20 60

No Late Payments(Group B)

10 90

Total 30 150

Probability of defaulting = 20/80 (.25)in Group A

Probability of not defaulting = 60/80 (.75)in Group A

Total80

100

180

Page 107: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Odds

107

Odds of Outcome in Group A

probability ofdefaulting in group with history of

late payments

probability ofnot defaulting in group with

history of late payments

0.25 ÷ 0.75 = 0.33

÷

Page 108: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Odds Ratio

108

Odds Ratio of Group A to Group B

odds ofdefaulting in group with history of

late payments

odds ofdefaulting ingroup with

no history oflate payments

0.33 ÷ 0.11 = 3

÷

Page 109: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Properties of the Odds Ratio

109

Page 110: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

Odds Ratio from a Logistic Regression ModelFor a predictor variable that has only two levels, you can exponentiate twice the parameter estimate that JMP provides to obtain the odds ratio.

Estimated odds ratio = exp(2*parameter estimate)

What are the odds a female purchases more than 100 dollars in items compared to a male?

110

(Gender)*3019.06141.0)ˆ(logit p

1.83

0.3019)exp(2= odds

Page 111: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

111

This demonstration illustrates the concepts discussed previously.

Odds Ratios

Page 112: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

112

Page 113: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

113

Exercise

This exercise reinforces the concepts discussed previously.

Page 114: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic
Page 115: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.08 Multiple Choice PollSuppose processes A and B are used to make a product, and each product is evaluated as defective or non-defective. Suppose the probability of a defective from A is .2 and of a non-defective from A is .8. Which is true?

a. The odds of a defective from group A is given by .8/.2 = 4.

b. The odds of a defective from group A is given by .2/.8 = .25.

115

Page 116: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.08 Multiple Choice Poll – Correct AnswerSuppose processes A and B are used to make a product, and each product is evaluated as defective or non-defective. Suppose the probability of a defective from A is .2 and of a non-defective from A is .8. Which is true?

a. The odds of a defective from group A is given by .8/.2 = 4.

b. The odds of a defective from group A is given by .2/.8 = .25.

116

Page 117: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.09 Multiple Choice PollThe odds of getting a defective product from process A is .25. What is its interpretation?

a. You expect only 1/4 as many defectives as non-defectives from process A.

b. You expect only 1/4 as many defectives as non-defectives from process B.

117

Page 118: 1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic

1.09 Multiple Choice Poll – Correct AnswerThe odds of getting a defective product from process A is .25. What is its interpretation?

a. You expect only 1/4 as many defectives as non-defectives from process A.

b. You expect only 1/4 as many defectives as non-defectives from process B.

118