1 chapter 4: design of experiments 4.1 why experiment? 4.2 introduction 4.3 multi-factor experiments...
TRANSCRIPT
1
Chapter 4: Design of Experiments
4.1 Why Experiment?
4.2 Introduction
4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous Responses
4.6 Recommended Reading
2
Chapter 4: Design of Experiments
4.1 Why Experiment?4.1 Why Experiment?
4.2 Introduction
4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous Responses
4.6 Recommended Reading
3
Objectives Explain the role of experiments in answering business
questions.
4
You Need to Know Work is full of questions that you need answers to.
Some have answers that only require a lookup: What is the policy regarding the use of demographic
variables in predictive models? When did you last send a marketing e-mail to
segment 17?
Some do not have readily available answers : Does it really matter whether you use first-class
postage when sending direct mailings for a cruise line?
How should you advertise if you want to maximize sales/expenditure ratio for football tickets?
5
Statistical Models Can Answer QuestionsThe models that you learn to use in this course can answer many of the questions that you have. Do you have the data to perform an analysis and
answer the question? Did you account for the kinds of variables that are in
your control as well as the kind of variables over which you have no control?
6
Questions Often Mean Comparing Things Does your question imply that a comparison is needed? First-class versus bulk-rate postage Primetime versus late-night advertising
Did you conduct an experiment?
7
Consider This…What is the question that you want to answer?
What is the population that you want the answer to pertain to?
What kinds of things do you want to compare that you can control?
How is the outcome measured (Yobs)?
What else impacts Yobs that you cannot control?
8
Consider This…What is the question that you want to answer?
1. Does postage make a difference in the response rate?
2. Is it worth the extra expense to advertise tickets for a football game in primetime?
9
Consider This…What is the question that you want to answer?
What is the population that you want the answer to pertain to?
1.The “luxury traveler” segment
2.Football fans
10
Consider This…What is the question that you want to answer?
What is the population that you want the answer to pertain to?
What kinds of things do you want to compare that you can control?
1.The class of postage on the offer envelope
2.Whether the tickets are advertised during primetime (expensive) or late night (inexpensive)
11
Consider This…What is the question that you want to answer?
What is the population that you want the answer to pertain to?
What kinds of things do you want to compare that you can control?
How is the outcome measured (Yobs)?
1.The number of responses from each postage group
2.Ticket sales in the week following each type of advertisement
12
Consider This…What is the question that you want to answer?
What is the population that you want the answer to pertain to?
What kinds of things do you want to compare that you can control?
How is the outcome measured (Yobs)?
What else impacts Yobs that you cannot control?
1.Gender, vacation already taken that year, children
2.Team’s season performance (wins, losses), disposable income of viewing markets, broadcasting lineup
13
Who Cares about Things You Cannot Control?You do!
Only accounting for the things in the experiment that you can control:
14
Who Cares about Things You Cannot Control?You do!
Accounting for the things in the experiment that you can control plus one thing that you cannot control:
15
Who Cares about Things You Cannot Control?You do!
Accounting for the things in the experiment that you can control plus two things that you cannot control:
16
Consider This…What is the question that you want to answer?
What is the population that you want the answer to pertain to?
What kinds of things do you want to compare that you can control?
How is the outcome measured (Yobs)?
What else impacts Yobs that you cannot control?
Work smarter: design an experiment!
17
Idea Exchange Have you ever conducted an experiment? If so, what
was the business or scientific objective? Web-based experiments are popular because they are
relatively inexpensive to implement and they can be modified in real time. Can you describe any Web experiments you have seen?
What kinds of factors might influenceclick-through behavior on, for example, an ad for insurance? For retailclothing? Other types of products and services?
18
Chapter 4: Design of Experiments
4.1 Why Experiment?
4.2 Introduction4.2 Introduction
4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous Responses
4.6 Recommended Reading
19
Objectives Define experimental design concepts and terminology. Relate experimental design concepts and terminology
to business marketing concepts and terminology.
20
Basic Terms in Design of Experiments (DOE)
Response
Factor
Factor Level
Effect
Power
Experimental Unit
Treatment
Replication
Balance
Orthogonality
21
Basic Terms in DOE: ResponseA response is the dependent variable of interest in the analyses. It is sometimes called the target or dependent variable.
Examples include the following:Response rate to direct mail solicitationsDefault (“Bad”) rate among credit customersBalance transfer amountFraudNumber of items purchased from a catalogSpend, six months after acquisition
22
Basic Terms in DOE: FactorA factor is an independent variable that is a potential source of variation in the response metric.
Examples include the following: Teaser or introductory APR Color of envelope Balance transfer fee Presence or absence of a sticker on a catalog First-class versus third-class mail Others?
23
Basic Terms in DOE: Factor LevelA factor level is a particular value, or setting, of a factor.
Examples include the following: 1.99% introductory APR White envelope 2% balance transfer fee Airline mile reward offer Third-class mail Others?
24
Basic Terms in DOE: EffectAn effect captures and measures the relationship between changes in factor levels and changes in the response metric.
25
Examples of an Effect
A offer with a sticker on it garners $10 more, in purchases, than a offer without.
26
Examples of an Effect
The white envelope has a 22% higher response rate than the grey envelope.
A offer with a sticker on it garners $10 more, in purchases, than a offer without.
27
A 1% increase in Introductory APR yields a 20% decrease in response rate.
The white envelope has a 22% higher response rate than the grey envelope.
A offer with a sticker on it garners $10 more, in purchases, than a offer without.
28
Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.
Examples include the following: 1.99% Intro Rate, in a White Envelope, no BT Fee 0% Intro Rate, in a Grey Envelope, 2% BT Fee 1.99% Intro Rate, in a Grey Envelope, 2% BT Fee 0% Intro Rate, in a White Envelope, no BT Fee
There are eight possible treatments when you have three factors, each at two levels.
29
Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.
Examples include the following: 1.99% Intro Rate, in a
White Envelope, no BT Fee
0% Intro Rate, in a Grey Envelope, 2% BT Fee
1.99% Intro Rate, in a Grey Envelope, 2% BT Fee
0% Intro Rate, in a White Envelope, no BT Fee
There are eight possible treatments when you have three factors, each at two levels.
30
Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.
Examples include the following: 1.99% Intro Rate, in a
White Envelope, no BT Fee
0% Intro Rate, in a Grey Envelope, 2% BT Fee
1.99% Intro Rate, in a Grey Envelope, 2% BT Fee
0% Intro Rate, in a White Envelope, no BT Fee
There are eight possible treatments when you have three factors, each at two levels.
31
Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.
Examples include the following: 1.99% Intro Rate, in a
White Envelope, no BT Fee
0% Intro Rate, in a Grey Envelope, 2% BT Fee
1.99% Intro Rate, in a Grey Envelope, 2% BT Fee
0% Intro Rate, in a White Envelope, no BT Fee
There are eight possible treatments when you have three factors, each at two levels.
32
Other Terms in DOE An experimental unit is the smallest unit to which a
treatment can be applied. Replication occurs when more than one experimental
unit receives the same treatment. Power is the probability that you will detect an effect, if
one exists.
33
Chapter 4: Design of Experiments
4.1 Why Experiment?
4.2 Introduction
4.3 Multi-Factor Experiments4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous Responses
4.6 Recommended Reading
34
Objectives Define multifactor experiments. State the advantages of multifactor experiments
versus a sequence of one-factor-at-a-time (OFAT). Explain how experimental units should be allocated to
the treatments. Define the term interaction. Analyze a simple multifactor experiment and identify
interactions.
35
Two Factors, Each at Two LevelsExample: Credit card solicitation with an introductory,
or teaser, rate The introductory (Intro) rate is High or Low. The go-to (Goto) rate is High or Low.
36
One Factor at a Time
7.99%
4.99%
Got
o
0% Intro 2.99%
Intro Test
Goto = ??
Goto Test
Intro = ??
...
37
One Factor at a TimeIntro Test
Hold Goto constant at 4.99%
Goto Test
Hold Intro constant at 0%
4.99%
7.99%
Got
o
0% Intro 2.99%
38
One Factor at a Time
4.99%
7.99%
Got
o
0% Intro 2.99%"Control"
"Goto Test"
"Intro Test"
39
Typical Volumes
4.99%
7.99%
Got
o
0% Intro 2.99%50,000 experimental units
50,000 experimental units
50,000 experimental units
40
EfficiencyVP of Marketing
Either a large numerator
or a small denominator
or both!
Experiment DesignerCan you quantify these terms?
•Number of items tested•Margin of error•Financial costs•Total sample size
...
41
EfficiencyVP of Marketing
Either a large numerator
or a small denominator
or both!
Experiment DesignerCan you quantify these terms?
•Number of items tested
•Total sample size
42
EfficiencyVP of Marketing
Either a large numerator
or a small denominator
or both!
...
Experiment DesignerCan you quantify these terms?•Two terms: Intro effect and Goto effect
•150,000 observations
43
Efficiency?!?
...
VP of Marketing
Either a large numerator
or a small denominator
or both!
Experiment DesignerCan you quantify these terms?•Two terms: Intro effect and Goto effect
•150,000 observations
44
Efficiency?!?
4.99%
7.99%
Go
to
0% Intro 2.99%
45
Efficiency?!?
4.99%
7.99%
Go
to
0% Intro 2.99%
This test uses only two-thirds of the data.
This test uses only two-thirds of the data.
46
One Factor at a Time
4.99%
7.99%
Go
to0% Intro 2.99%
4.99%
7.99%
Go
to
4.99%
7.99%
4.99%
7.99%
0% Intro 2.99%
0% Intro 2.99%
0% Intro 2.99%
There are many different ways to arrange the “same” test.
They all assume no interaction between Intro and Goto.
None of these eliminates the potential for bias in the estimates.
Go
toG
oto
47
Pick a Treatment Set
4.99%
7.99%
Go
to0% Intro 2.99%
4.99%
7.99%
0% Intro 2.99%
4.99%
7.99%
Go
to
0% Intro 2.99%
4.99%
7.99%
Go
to
0% Intro 2.99%
Go
to
48
Detecting Interactions between FactorsR
esp
on
seR
ate
Intro Rate
Low Goto
High Goto
Low High
49
Factorial Arrangement of the Treatments
Permits the testing and estimation of an Intro x Goto interaction term.
Increases the precision of estimates for the same test volumes.
Can use every individual in every test.Combinations of factor levels provide replication for individual factors.
4.99%
7.99%
Got
o
0% Intro 2.99%
4.99%
7.99%
Got
o
0% Intro 2.99%
50
Efficiency! Reuse Observations
The Intro test uses every observation.
4.99%
7.99%
Got
o
0% Intro 2.99%
4.99%
7.99%
Got
o
0% Intro 2.99%
51
Efficiency! Reuse Observations
The Goto test uses every observation.
4.99%
7.99%
Got
o
0% Intro 2.99%
4.99%
7.99%
Got
o
0% Intro 2.99%
52
Efficiency! Additional Tests
Having four treatment means yields up to four model df.
This treatment structure enables the estimation of the Intro x Goto interaction term.
4.99%
7.99%
Got
o
0% Intro 2.99%
4.99%
7.99%
Got
o
0% Intro 2.99%
53
Efficiency! Same or Smaller Sample Size
Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.
4.99%
7.99%
Got
o
0% Intro 2.99%
4.99%
7.99%
Got
o
0% Intro 2.99%
...
54
Efficiency! Same or Smaller Sample Size
Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.
25,00025,000
25,000 25,000
50,000 Low 50,000 High
...
55
Efficiency! Same or Smaller Sample Size
Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.
25,00025,000
25,000 25,000 50,000 High
50,000 Low
...
56
Efficiency! Same or Smaller Sample Size
Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.
25,00025,000
25,000 25,000
100,000 Total
57
Efficiency?
Balance of the marginal totals might not be all that is required.
50,000
50,000
...
58
Efficiency?
Balance of the marginal totals might not be all that is required.
49,999
49,999
1
1
...
59
Efficiency?
Balance of the marginal totals might not be all that is required.
49,500
49,500
500
500
60
Efficiency Is Still a Balancing Act
Balancing the sample size over all of the treatments seems like a reasonable goal.
25,00025,000
25,000 25,000
100,000 Total
61
RandomizationAfter the treatment structure is defined, the next step is to randomly assign treatments to experimental units. A typical approach to randomization of 100,000 customers to four treatments includes the following steps: Define the population of interest. Select a simple random sample from the population
equal to the total samples size – for example,100,000. Randomly partition the sample into four equal groups
– for example, 25,000. Assign each group to one of the four treatments.
62
Analyzing a 2-by-2 Factorial Experiment with Interaction
Credit Card Case Study
Task: Use SAS Enterprise Guide to graph, analyze, and interpret the results of the two-factor experiment testing two different levels of intro rate and goto rate.
63
Analyzing a 2-by-2 Factorial Experiment with No Interaction
Credit Card Case Study
Task: Use SAS Enterprise Guide to graph, analyze, and interpret the results of the two-factor experiment testing two different levels of intro rate and goto rate when no interaction is present.
64
Idea ExchangeConsider the previous experiment. What attributes of the customer might affect an
individual’s likelihood to respond to an offer? How could you use your knowledge of these attributes
to improve the study’s design and treatment structure? How could you use your knowledge of the attributes to
improve the analysis of the experimental data?
65
Exercise
This exercise reinforces the concepts discussed previously.
66
Chapter 4: Design of Experiments
4.1 Why Experiment?
4.2 Introduction
4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous Responses
4.6 Recommended Reading
67
Objectives Explain the concept of orthogonality and why it is
important. Explain the concept of blocking and why it is useful. Analyze and interpret a multifactor experiment with
blocks.
68
OrthogonalityAnother ideal property of an experimental design is orthogonality among the elements of interest. There are at least three ways to think about the importance of this property: Algebraic interpretation – Matrices behave well. Geometric interpretation – Pictures look nice. Statistical interpretation – Estimates have low
variance.
69
Two-Level Full Factorial Coding
I A B AB
+1 +1 +1 +1
+1 +1 -1 -1
+1 -1 +1 -1
+1 -1 -1 +1
70
The Effect of Factor A
I A B AB
+1 +1 +1 +1
+1 +1 -1 -1
+1 -1 +1 -1
+1 -1 -1 +1
71
The Effect of Factor B
I A B AB
+1 +1 +1 +1
+1 +1 -1 -1
+1 -1 +1 -1
+1 -1 -1 +1
72
The Interaction Effect AB
I A B AB
+1 +1 +1 +1
+1 +1 -1 -1
+1 -1 +1 -1
+1 -1 -1 +1
73
Factorial Arrangement versus OFATFactorial Treatment Structure
Pros
+ Reuses observations (morepower for fewer exp units)
+ Tests for interactions
+ Guarantees balanced and orthogonal treatment plans
+ Is an efficient way to test many factors
Cons
- Can be more complicated to set up
- Can be more complicated to sell to a non-technical audience
74
Factorial Arrangement versus OFAT
Pros
+ Are easy to set up – A/B and Champion/Challenger tests are typical in many industries
+ Might yield lower per-unit printing costs
+ Have clear “control” offer, clear test offers
+ Do not require users to learn new words such as“balance” and “orthogonality”!
Cons
+/- Permit simple analysis that could be done with a pencil and paper!
- Do not allow a test for interactions
- Represent an inefficient use of experimental units
One-Factor-at-a-Time Tests
75
BlockingIt is typical to use the same statistic to test
(H0:pmen= pwomen) as
(H0:pred envelope= pblue envelope).
Are these factors equivalent from the perspective of experimental design?
76
BlockingYou can controlfeatures of the offer you make: Creative Color Pricing Duration of offer
Any restrictions are typically self-imposed.
These are usually factors in the test, not blocks.
You cannot controlfeatures of your experimental units: Risk profile Responsiveness Geography Age Gender
Restrictions here are typically features of the population of interest, and are often treated as blocks.
77
BlockingBlocks are groups of experimental units that are homogeneous in some way. Typically, they represent nuisance variability.
Blocks might or might not be randomly selected.
Because units exist in blocks, rather than being assigned to them, blocks reflect a restriction on the randomization in an experiment.
78
Analyzing an Experiment with Blocks
Credit Card Case Study
Task: Incorporate a continuous measure such as risk score into a block/factor in an experiment.
79
Idea ExchangeConsider the kinds of variables that you have no control over. These variables might be important with some types of product offers but not others. What types of product offers might have different response rates based on the following characteristics? Risk profile Geographic regions such as north, south, east,
and west Age Gender Urban, suburban, rural
Can you think of others?
80
Statistically Well-Formulated ModelA well-formulated model maintains the hierarchy of the terms in the model as model reduction is performed. Terms are removed one-at-a-time and the model is refit before removing any more terms.
Intercept
A B
A*B
81
Statistically Well-Formulated ModelA well-formulated model maintains the hierarchy of the terms in the model as model reduction is performed. Terms are removed one-at-a-time and the model is refit before removing any more terms.
Intercept
A
A*B
CB
A*C B*C
A*B*C
82
Exercise
This exercise reinforces the concepts discussed previously.
83
Chapter 4: Design of Experiments
4.1 Why Experiment?
4.2 Introduction
4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous 4.5 Business Experiments with Continuous Responses Responses
4.6 Recommended Reading
84
Objectives Name several continuous response variables you
might encounter in business experiments. Describe issues related to analyzing business
experiments with continuous responses.
85
The Response VariableIn many business applications, the key target variables of interest are binary, and can be expressed as a proportion: Did the customer purchase a product? (What
proportion of customers purchased?) Did the product fail? (What proportion of products
failed?) Did the customer churn? (What proportion of
customers churned?) Was a purchase fraudulent? (What proportion of
purchases are fraudulent?) Was there a claim on the policy? (What proportion of
policies have claims?)
86
The Response VariableIt is also common to find continuous responses in business models: Revenue per store Number of new customers following an advertising
campaign Customer value per mailing Time until churn Wait time on hold in a call center Expected lifetime for a manufactured product Average profit per SKU
87
Where Traditional Statistics Meet the Road Ordinary least squares (OLS) regression and ANOVA
models (linear models) are designed to handle continuous responses.
However, not all continuous responses are suitable for OLS models.
88
The DistributionRevenue per store
Customer value per mailing
Wait time on hold in a call center
89
The DistributionRevenue per store
Customer value per mailing
Wait time on hold in a call center
90
The DistributionRevenue per store
Customer value per mailing
Wait time on hold in a call center
91
Experimental Design and Response TypeDesign and analysis go hand in hand.
Design the experiment so that the analysis will be easy.
Fortunately, the design of the experiment is not contingent on the type of response variable that the data generates.
The same experimental design can be used for evaluating response rate, customer dollar value, lift in revenue, and many other features, regardless of whether they are continuous or categorical.
92
How Do You Analyze These Response Variables? There are many statistical techniques available for
modeling continuous responses that are not suited for either logistic regression or OLS techniques.
Advances in computing power and technology make such techniques available for business applications through statistical software.
These techniques require in-depth understanding of advanced and specialized statistical concepts, and should be used under the direction of a skilled statistician.
93
Idea ExchangeHow could you incorporate what you know about the cost and profit resulting from different settings (for example, cost of postage or higher profit from higher APR) to help you design an experiment?
94
Chapter 4: Design of Experiments
4.1 Why Experiment?
4.2 Introduction
4.3 Multi-Factor Experiments
4.4 Orthogonality and Blocking
4.5 Business Experiments with Continuous Responses
4.6 Recommended Reading4.6 Recommended Reading
95
Recommended ReadingAriely, Dan. “Why Businesses Don’t Experiment.” Harvard Business Review. April 2010. http://hbr.org/2010/04/column-why-businesses-dont-experiment/ar/1
96
Recommended ReadingDavenport, Thomas. “How to Design Smart Business Experiments.” Harvard Business Review. February 2009. http://hbr.org/2009/02/how-to-design-smart-business-experiments/ar/1
97
Recommended ReadingMay, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapters 2 and 3