causalimpact - tests using the r package

- Christophe Moinet [email protected]

1

http://google.github.io/CausalImpact/

http://google.github.io/CausalImpact/

Introduction

Data

Choice of control stores

Run of CausalImpact

Stability of the control stores

Conclusion

- Christophe Moinet - [email protected] 2

CausalImpact is a new open-source R packagefor estimating causal effects in time series, usinga Bayesian model (Bayesian Structural Time Seriesmethods). It is developed by Google teams.

This test with CausalIimpact uses daily storesales with a causal event on several periods.

The objective of this test is to compute theimpact of this causal using◦ a set of stores impacted by this causal (test stores)◦ a set of stores not impacted (control stores)

Data used for this test are simulated data, theyare not real data.

This document describes this test.

- Christophe Moinet - [email protected]

http://google-opensource.blogspot.fr/2014/09/causalimpact-new-open-source-package.html

Dimensions :

◦ Stores : around 300

◦ Days of the year : 365

◦ Causal : 1

◦ Product line : 1

Facts :

◦ Sales, by Store, Day, for this product line

◦ Causal event (Yes/No) by Store, day for this product line

◦ Reminder : Data used in this test have been simulated and are not real.


4

Data sales are averaged by day for building :◦ A control set of stores which are not impacted by

the studied causal◦ A test set of stores which are impacted by the

causal.

Statistics :◦ 50 stores are impacted by the studied causal from

September to December (test stores)◦ 250 stores are not impacted from January to

December (control stores)

Sales :


5

Stores Pre.period Post.period

Test 92 126

Control 93.5 105.5

Average Sales

It is always a tricky part.

=> But it is essential for the impactcalculation

Issues appear with the first set of control Iprepared : Because their sales trend is too different from the

sales trend of test stores.

These issues are detected using CausalImpact.


Pre.period (pre interventionperiod) is the periodwithout causal effect.Post.period is the periodwith causal effect.

Differences during thepre.period, between actualand prediction, are notrandomly distributed aroundthe null horizontal axe.

This shows an issue due tocontrol data.


7

This issue is more visible ifwe try to predict salesduring the second part ofthe pre.period (period 151to 250)

A negative effect is shown

=> Data need to be purified.

This validation usingCausalImpact is reallyhelpful to validate controlstores.


Cumulative effect

Difference between actualand estimated sales

To correct this issue, Isimulate a better set ofcontrol stores.

Now, no impacts appear onperiods 151 to 250.

Those control stores have abetter trend compared to teststores.

The past issues are fixed.

This way of doing could beused for real projects tovalidate the pre.period data.



Original sales

Cumulative effect

CausalImpact shows a significant impact on

sales :

relative effect = 20%


Stores Pre.period Post.period

Test 92 126

Control 93.5 105.5

Average SalesAdditionalBasic statistics :

Original sales


Cumulative effect


11

◦ I developed a R process

◦ simulating the value of the causal impact for 500samples of control stores

◦ Each sample having 80% of total control stores,selected randomly.

◦ The algorithm is shown on next slide.


12

Selection of the control stores

Selection of the test stores

Random selection of 80% of control stores

Averaging by period (day)

Averaging by period (day)

CausalImpactLoops 1 to 500

Total set of stores

Set of 500 causal impact computations13

As a result :

- We get 500 different values of thepercent impact.

- We can show the distribution of theseimpacts

- 20% is the value of the impact from thewhole control sample (see slide 10 ).It’s correct compared to thisdistribution of impacts.

- The variation of impacts is quite low :Good for the stability of the impactvalue.

- But could be different with real cases.- RelEffect is the causal

impact (in %)- Frequency is the

number of simulations


This package is really great :

◦ The general methodology can be understood andexplained easily. However the statistical methodologyis a bit more difficult.

◦ Output charts are really easy to use and understand

◦ The detailed report is helpful for a non statistician.

◦ Helpful for validating any set of control stores

◦ Possibility to go more deeper than the standard use.

◦ Easy to use as a step of a R process : can be used in aloop to test different stores, different clusters,different causals.

- Christophe Moinet [email protected]

15

Results must be validated (significance of the effect)A good way to validate the results is to use crossvalidation.

And :◦ The difference between actual and estimated values

must be checked for the pre intervention period asshown in this document.

◦ Any issue with this validation needs a datavalidation. Control stores might be biased.

- Christophe Moinet [email protected] 16

17

Thank you

Feel free to contact me:

Christophe Moinet [email protected]+33 6 58 00 33 36

mailto:[email protected]

causalimpact - tests using the r package

Documents