matrix completion methods for causal panel data models · 2017. 7. 27. · ii. how/why do we...

54
Matrix Completion Methods for Causal Panel Data Models Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, & Khashayar Khosravi (Stanford University) NBER Summer Institute Cambridge, July 27th, 2017

Upload: others

Post on 26-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Matrix Completion Methods

for Causal Panel Data Models

Susan Athey, Mohsen Bayati,

Nikolay Doudchenko, Guido Imbens, & Khashayar Khosravi

(Stanford University)

NBER Summer Institute

Cambridge, July 27th, 2017

Page 2: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

• We are interested in estimating the (average) effect of a

binary treatment on a scalar outcome.

• We have data on N units, for T periods.

• We observe

– the treatment, Wit ∈ 0,1,

– the realized outcome Yit,

– time invariant characteristics of the units Xi,

– unit-invariant characteristics of time Zt,

– time and unit specific characteristics Vit

1

Page 3: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

We observe (in addition to covariates):

Y =

Y11 Y12 Y13 . . . Y1TY21 Y22 Y23 . . . Y2TY31 Y32 Y33 . . . Y3T

... ... ... . . . ...YN1 YN2 YN3 . . . YNT

outcome.

W =

1 1 0 . . . 10 0 1 . . . 01 0 1 . . . 0... ... ... . . . ...1 0 1 . . . 0

treatment.

• rows are units, columns are time periods. (Important

because some, but not all, methods treat units and time

periods asymmetric)

2

Page 4: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

In terms of potential outcomes:

Y(0) =

? ? X . . . ?X X ? . . . X? X ? . . . X... ... ... . . . ...? X ? . . . X

Y(1) =

X X ? . . . X? ? X . . . ?X ? X . . . ?... ... ... . . . ...X ? X . . . ?

.

In order to estimate the average treatment effect for thetreated, (or other average, e.g., overall average effect)

τ =

∑i,tWit

Yit(1)− Yit(0)

∑itWit

,

We need to impute the missing potential outcomes in atleast one of Y(0) and Y(1).

3

Page 5: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Focus on problem of imputing missing in Y (either Y(0) or

Y(1))

YN×T =

? ? X . . . ?X X ? . . . X? X ? . . . X... ... ... . . . ...? X ? . . . X

O and M are sets of indices (it) with Yi,t observed and missing,

with cardinalities |O| and |M|. Covariates, time-specific, unit-

specific, time/unit-specific.

• This is a Matrix Completion Problem.

4

Page 6: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

General set up:

YN×T = LN×T + εN×T

• Key assumption “Matrix Unconfoundedness”:

WN×T ⊥⊥ εN×T

∣∣∣∣ LN×T(but W may depend on L)

• In addition:

LN×T ≈ UN×RV>T×R

well approximated by matrix with rank R low relative to N

and T .

5

Page 7: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

• Classification of practical problems depending on

– magnitude of T and N ,

– pattern of missing data, fraction of observed data

|O|/(|O|+ |M|) close to zero or one.

• Different structure on L in

– average treatment effect under unconfoundedness lit.

– synthetic control literature

– panel data / DID / fixed effect literature

– machine learning literature

6

Page 8: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Classification of Problem I: Magnitude of N and T

Thin Matrix (N large, T small), typical cross-section setting:

YN×T =

? X ?X ? X? ? XX ? X? ? ?... ... ...? ? X

(many units, few time periods)

Fat Matrix (N small, T large), time series setting:

YN×T =

? ? X X X . . . ?X X X X ? . . . X? X ? X ? . . . X

(few units, many periods)

Or approx square matrix, N and T comparable magnitude.

7

Page 9: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Classification of Problem II: Pattern of Missing Data

Most of econometric causal literature focuses on case with

block of Treated Units / Time Periods

YN×T =

X X X X . . . XX X X X . . . XX X X X . . . XX X X ? . . . ?X X X ? . . . ?... ... ... ... . . . ...X X X ? . . . ?

=

(YC,pre(0) YC,post(0)YT,pre(0) ?

)

Easier because it allows for complete-data modeling of

• cond. distr. of YC,post(0) given YC,pre(0) (matching) or

• cond. distr. of YT,pre(0) given YC,pre(0) (synt. control).

8

Page 10: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Two important special cases:

Single Treated Unit (Abadie et al Synthetic Control)

YN×T =

X X X . . . XX X X . . . XX X X . . . X... ... ... . . . ...X X ? . . . ? ← (treated unit)

Single Treated Period (Most of Treatment Effect Lit)

YN×T =

X X X . . . XX X X . . . XX X X . . . ?... ... ... . . . ...X X X . . . ?

↑(treated period)

9

Page 11: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Other Important Assignment Patterns

Staggered Adoption (e.g., adoption of technology, Athey

and Stern, 1998)

YN×T =

X X X X . . . X (never adopter)X X X X . . . ? (late adopter)X X X X . . . ?X X ? ? . . . ?X X ? ? . . . ? (medium adopter)... ... ... ... . . . ...X ? ? ? . . . ? (early adopter)

10

Page 12: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Netflix Problem

• Very very large N (number of individuals),

• Large T (number of movies),

• raises computational issues

• General missing data pattern,

• Fraction of observed data is close to zero, |O| << |M|

YN×T =

? ? ? ? ? X . . . ?X ? ? ? X ? . . . X? X ? ? ? ? . . . ?? ? ? ? ? X . . . ?X ? ? ? ? ? . . . X? X ? ? ? ? . . . ?... ... ... ... ... ... . . . ...? ? ? ? X ? . . . ?

11

Page 13: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Fat Matrix: Vertical Regression

𝑦𝑦1,1 𝑦𝑦1,2𝑦𝑦2,1 𝑦𝑦2,2𝑦𝑦3,1 𝑦𝑦3,2

⋯ 𝑦𝑦1,𝑇𝑇−1 𝑦𝑦1,𝑇𝑇⋯ 𝑦𝑦2,𝑇𝑇−1 𝑦𝑦2,𝑇𝑇⋯ 𝑦𝑦3,𝑇𝑇−1 𝑦𝑦3,𝑇𝑇

𝑦𝑦𝑁𝑁,1 𝑦𝑦𝑁𝑁,2 ⋯ 𝑦𝑦𝑁𝑁,𝑇𝑇−1 ?

𝑦𝑦𝑁𝑁,𝑇𝑇 = 𝑖𝑖>1

𝜔𝜔𝑖𝑖𝑦𝑦𝑖𝑖𝑇𝑇

𝑦𝑦𝑁𝑁,𝑡𝑡=𝜔𝜔0 + ∑𝑖𝑖<𝑁𝑁𝜔𝜔𝑖𝑖𝑦𝑦𝑖𝑖,𝑡𝑡 + 𝜖𝜖𝑡𝑡

• Outcome: • Target unit outcome in period t

• Covariates: • Other unit’s outcomes in same period.

• Observation is a time period. • What is stable:

• Patterns across units

• Identification: • 𝑌𝑌𝑁𝑁,𝑡𝑡 0 ⊥ 𝑊𝑊𝑁𝑁,𝑡𝑡|𝑌𝑌1,𝑡𝑡, . . ,𝑌𝑌𝑁𝑁−1,𝑡𝑡

• Examples:• Synthetic control: 𝜔𝜔𝑖𝑖 ≥ 0, ∑𝑖𝑖>1𝜔𝜔𝑖𝑖 = 1• Doudchenko-Imbens: estimate 𝜔𝜔𝑖𝑖 w/

elastic net

Page 14: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Thin Matrix: Horizontal Regression

• Outcome:• Target time period outcome

• Covariates:• Other time period outcome for same unit

• Observation is a unit. • What is stable:

• Time patterns within a unit • Identification:

• 𝑌𝑌𝑖𝑖,𝑇𝑇 0 ⊥ 𝑊𝑊𝑖𝑖,𝑇𝑇|𝑌𝑌𝑖𝑖,1, . . ,𝑌𝑌𝑖𝑖,𝑇𝑇−1• Examples:

• Matching, ATE literature: avg. outcomes from units with most similar 𝒚𝒚𝑖𝑖,−𝑇𝑇

• With regularization: Chernozhukov et al, Athey, Imbens and Wager (2017)

• Closely related to transposed versions of Synthetic Controls, Elastic Net

𝑦𝑦𝑁𝑁,𝑇𝑇 = 𝑡𝑡<𝑇𝑇

𝜔𝜔𝑡𝑡𝑦𝑦𝑁𝑁,𝑡𝑡

𝑦𝑦𝑖𝑖,𝑇𝑇=∑𝑡𝑡<𝑇𝑇 𝜔𝜔𝑡𝑡𝑦𝑦𝑖𝑖,𝑡𝑡 + 𝜖𝜖𝑖𝑖

𝑦𝑦1,1 𝑦𝑦1,2 𝑦𝑦1,𝑇𝑇𝑦𝑦2,1 𝑦𝑦2,2 𝑦𝑦2,𝑇𝑇𝑦𝑦3,1 𝑦𝑦3,2 𝑦𝑦3,𝑇𝑇⋮⋮

⋮⋮

⋮⋮

𝑦𝑦𝑁𝑁,1 𝑦𝑦𝑁𝑁,2 ?

Page 15: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

General Matrix: Matrix Regression (Panel)

• Panel data regression. Exploit additive structure in unit and time effects.

• Identification: 𝑌𝑌𝑖𝑖,𝑡𝑡(0) ⊥ 𝑊𝑊𝑖𝑖,𝑡𝑡|𝛾𝛾𝑖𝑖 ,𝛿𝛿𝑡𝑡• Matrix formulation of identification: 𝑌𝑌𝑁𝑁×𝑇𝑇(0) ⊥ 𝑊𝑊𝑁𝑁×𝑇𝑇|𝐿𝐿𝑁𝑁×𝑇𝑇

𝑦𝑦𝑁𝑁𝑇𝑇 = 𝛾𝛾𝑁𝑁 + 𝛿𝛿𝑇𝑇

𝑦𝑦𝑖𝑖,𝑡𝑡 = 𝛾𝛾𝑖𝑖 + 𝛿𝛿𝑡𝑡 + 𝜖𝜖𝑖𝑖,𝑡𝑡𝑦𝑦1,1 ⋯ 𝑦𝑦1,𝑇𝑇𝑦𝑦2,1 … 𝑦𝑦2,𝑇𝑇𝑦𝑦3,1 ⋯ 𝑦𝑦3,𝑇𝑇⋮⋮

⋮⋮

⋮⋮

𝑦𝑦𝑁𝑁,1 ⋯ ?

𝑌𝑌𝑁𝑁×𝑇𝑇 = 𝐿𝐿𝑁𝑁×𝑇𝑇 + 𝜖𝜖𝑁𝑁×𝑇𝑇 =𝛾𝛾1 1⋮ ⋮𝛾𝛾𝑁𝑁 1

1 ⋯ 1𝛿𝛿1 ⋯ 𝛿𝛿𝑇𝑇

+ 𝜖𝜖𝑁𝑁×𝑇𝑇

Page 16: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

II. How/why do we regularize:

Potentially many parameters when (i) vertical regression on

thin matrix, (ii) horizontal regression on fat matrix, iii) ma-

trix is approx square:

“Regularization theory was one of the first signs of

the existence of intelligent inference.” (Vapnik, 1999,

p. 9)

• Need regularization to avoid overfitting.

• How you do the regularization is important for substan-

tive and computational reasons: lasso/elastic-net/ridge are

better than best subset in simple regression setting.

13

Page 17: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Literature:

Regularize → No Regular. Best `1/LASSO, `2 et alSubset

Regression↓

Horizontal earlier causal – Chernozhukov et aleffect lit. Athey et al

Vertical Abadie-Diam., – Doudch.-Imb.Hainmueller & Abadie-L’Hour

Matrix two-way Bai (2003) Current Paperfixed effect Xu (2017)literature

14

Page 18: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Econometric Literature I: Treatment Effect / Matching-

Regression

• Thin matrix (many units, few periods), single treated period

(period T ).

Strategy: Use controls to regress Yi,T on lagged outcomes

Yi,1, . . . , Yi,T−1. NC obs, T − 1 regressors.

• Does not work well if Y is fat (few units, many periods).

• Key identifying assumption: YiT (0) ⊥⊥WiT |Yi1, . . . , YiT−1

15

Page 19: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Econometric Literature II: Abadie-Diamond-Hainmueller

Synthetic Control Literature

• Fat matrix, single treated unit (unit N), treatment starts

in period T0.

Strategy: Use pretreatment periods to regress YN,t on con-

temporaneous outcomes Y1,t, . . . , YN−1,t. T0 − 1 obs, N re-

gressors. Weights (regression coefficients) are nonnegative

and sum to one, no intercept.

• Does not work well if matrix is thin (many units).

• Key identifying assumption: YNt(0) ⊥⊥WNt|Y1t, . . . , YN−1t

16

Page 20: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Econometric Literature III: Doudchenko-Imbens

• Fat matrix or similar N , T , single treated unit (unit N),

treatment starts in period T0.

Strategy: Use pretreatment periods to regress YN,t on con-

temporaneous outcomes Y1,t, . . . , YN−1,t. using elastic net

regularization. T0 − 1 obs, N regressors.

• Allows for negative weights, weights summing to some-

thing other than one, non-zero intercept, typically requires

regularization.

17

Page 21: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Econometric Literature IV: Transposed Abadie-Diamond-

Hainmueller or Doudchenko-Imbens (Reverse role of time

and units compared to ADH or DI)

• fat matrix, single treated unit (N), treatment in period T .

Strategy: Use control units to regress YiT on lagged out-

comes Yi1, . . . , YiT−1. using elastic net regularization. NCobs, T − 1 regressors.

• Allows for negative weights, weights summing to something

other than one, non-zero intercept.

Similar to regression estimator for matching setting,

with regularization.

19

Page 22: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Econometric Literature V: Fixed Effect Panel Data Lit-

erature / Difference-In-Differences

• T and N similar, general pattern for treatment assignment.

Model:

Yit = αi + γt + εi,t

• Symmetric in role of units and time periods.

• Suppose T = 2, N = 2, W2,2 = 1, Wi,t = 0 if (i, t) 6= (2,2),

then we have a classic DID setting, leading to imputed value

Y2,2 = Y1,2 +

Y2,1 − Y1,1

20

Page 23: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Questions: What to do if we are unsure about thin/fat/square,

with staggered adoption or general assignment mechanism?

• We generalize interactive fixed effects model (Bai, 2003,

2009; Xu 2017, Gobillon and Magnac, 2013; Kim and

Oka, 2014), allowing for large rank L.

• We propose a new estimator with novel regularization:

– can deal with staggered/general missing data patterns

– Computationally feasible bec. convex optimization probl.

– Reduces to matching under assump. in thin case.

– Reduces to synt. control under assump. in fat case.

21

Page 24: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Xi is P -vector, Zt is Q vector.

Model (generalized version of Xu, 2017):

Yit = Lit +P∑p=1

Q∑q=1

XipHpqZqt + γi + δt + Vitβ + εit

Unobserved: Lit, γi, δt, Hpq, β, εit.

22

Page 25: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

• We do not necessarily need the fixed effects γi and δt, these

can be subsumed into L.

If Lit = γi + δt, then L is a rank 2 matrix:

L =(γN×1 ιN×1

) (ιT×1 δT×1

)>

=

γ1 1γ2 1... ...γN 1

(

1 1 . . . 1δ1 δ . . . δT

)

• It may be convenient to include the fixed effects given that

we regularize L.

23

Page 26: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Too many parameters (especially N × T matrix L), so we

need regularization:

We shrink L and H towards zero.

For H we use Lasso-type element-wise `1 norm: defined as

‖H‖1,e =∑Pp=1

∑Qq=1 |Hpq|.

24

Page 27: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

How do we regularize LN×T?

In linear regression with many regressors,

Yi =K∑l=1

βkXik + εi,

we often regularize by adding a penalty term λ‖β‖ where

‖β‖ = ‖β‖0 =K∑k=1

1|βk|6=0 best subset selection

‖β‖ = ‖β‖1 =K∑k=1

|βk| LASSO

‖β‖ = ‖β‖22 =K∑k=1

|βk|2 ridge

25

Page 28: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Matrix norms for N × T Matrix LN×T

LN×T = SN×NΣN×TRT×T (singular value decomposition)

S, R unitary, Σ is rectang. diagonal with entries σi(L) thatare the singular values. Rank(L) is # of non-zero σi(L).

‖L‖2F =∑i,t

|Lit|2 =min(N,T )∑j=1

σ2i (L) (Frobenius, like ridge)

‖L‖∗ =min(N,T )∑j=1

σi(L) (nuclear norm, like LASSO)

‖L‖R = rank(L) =min(N,T )∑j=1

1σi(L)>0 (Rank, like subset)

26

Page 29: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Xu (2017) focuses on case with block assignment,

Y =

(YC,pre YC,postYT,pre ?

)Following Bai (2009), Xu fixes the rank R(L) so we can write

L as a matrix with an R-factor structure:

L = UV> =

(UCUT

)(VpreVpost

)>where

U is N ×R, V is T ×R

27

Page 30: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Xu (2017) two-step method:

First, use all controls to estimate UC, Vpre, Vpost:

minUC ,Vpre,Vpost

∥∥∥∥∥∥YC −UC

(VpreVpost

)>∥∥∥∥∥∥Second, use the treated units in pre period to estimate UT

given Vpre:

minUT

∥∥∥YT,pre −UT V>pre

∥∥∥Choose rank of L through crossvalidation (equivalent to reg-

ularization through rank).

28

Page 31: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Two Issues

• Xu’s approach does not work with staggered adoption (there

may be only few units who never adopt), or general assign-

ment pattern.

• Xu’s method is not efficient because it does not use the

YT,pre data to estimate V.

29

Page 32: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Modified Xu (2017) method:

minL

1

|O|∑

(i,t)∈O(Yit − Lit)2 + λL‖L‖R

• More efficient, uses all data.

• Works with staggered adoption and general missing data

pattern.

• Computationally intractable with large N and T because of

non-convexity of objective function (like best subset selection

in regression).

30

Page 33: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Our proposed method: regularize using using nuclear norm:

L = minL

1

|O|∑

(i,t)∈O(Yit − Lit)2 + λL‖L‖∗

• The nuclear norm ‖·‖∗ generally leads to a low-rank solution

for L, the way LASSO leads to selection of regressors.

• Problem is convex, so fast solutions available.

31

Page 34: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Estimation: L is obtained via the following procedure∗:

(1) Initialize L1 by 0N×T .

(2) For k = 1,2, . . . repeat till convergence (O is where weobserve Y):

Lk+1 = Shrinkλ(PO(Y) + P⊥O (Lk)

)Here PO, P

⊥O , and Shrinkλ are matrix operators on RN×T . For any AN×T ,

PO(A) is equal to A on O and is equal to 0 outside of O. P⊥O (A) is theopposite; it is equal to 0 on O and is equal to A outside of O.

For SVD A = SΣR′ with Σ = diag(σ1, . . . σmin(N,T )),

Shrinkλ(A) = Sdiag(σ1 − λ, . . . , σ` − λ, 0, . . . ,0︸ ︷︷ ︸min(N,T )−`

)R′ .

where σ` is the smallest singular value of A that is larger than λ.

∗More details in Mazumder, Hastie, and Tibshirani (2010)

32

Page 35: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

General Case: We estimate H, L, δ, γ, and β as

minH,L,δ,γ

1

|O|

∑(i,t)∈O

Yit − Lit − ∑1≤p≤P1≤q≤Q

XipHpqZqt − γi − δt − Vitβ

2

+ λL‖L‖∗ + λH‖H‖1,e

• The same estimation procedure as before applies here with

an additional Shrink operator for H.

• We choose λL and λH through crossvalidation.

33

Page 36: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Additional Generalizations I:

• Allow for propensity score weighting to focus on fit where

it matters:

Model propensity score Eit = pr(Wit = 1|Xi, Zt, Vit), E is N×Tmatrix with typical element Eit

Possibly using matrix completion:

minE

1

NT

∑i,t

(Wit − Eit)2 + λL‖E‖∗

and then

minL

1

|O|∑

(i,t)∈O

Eit1− Eit

(Yit − Lit)2 + λL‖L‖∗

34

Page 37: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Additional Generalizations II:

• Take account of of time series correlation in εit = Yit − Lit

Modify objective function from logarithm of Gaussian likeli-

hood based on independence to have autoregressive struc-

ture.

35

Page 38: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Adaptive Properties of Matrix Regression I

Suppose N is large, T is small, Wit = 0 if t < T (ATE under

unconf setting), and the data-generating-process is

YiT = µ+T−1∑t=1

αtYit + εiT , εiT ⊥⊥ (Yi1, . . . , Yi,T−1)

Then matrix regression ≈ horizontal regression, and γi = 0,

δ = (0,0, . . . , µ), and rank T − 1 matrix

L =

Y11 Y12 . . . Y1,T−1 µ+∑T−1t=1 αtY1t

Y21 Y22 . . . Y2,T−1 µ+∑T−1t=1 αtY2t

Y31 Y32 . . . Y3,T−1 µ+∑T−1t=1 αtY3t

... ... . . . ... ...

YN1 YN2 . . . YN,T−1 µ+∑T−1t=1 αtYNt

(rank T−1)

36

Page 39: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Adaptive Properties of Matrix Regression II

Suppose N is small, T is large, single treated unit, (synthetic

control setting) and the data-generating-process is

YNt = µ+N−1∑i=1

αiYit + εNt, εNt ⊥⊥ (Y1t, . . . , YN−1,t)

Then matrix regression ≈ vertical regression, and γi = (0,0, . . . , µ),

δ = 0, and rank N − 1 matrix

L =

Y11 Y12 . . . Y1TY21 Y22 . . . Y2TY31 Y32 . . . Y3T

... ... . . . ...YN−1,1 YN−1,2 . . . YN−1,T

µ+∑N−1i=1 ωiYi1 µ+

∑N−1i=1 ωiYi2 . . . µ+

∑N−1i=1 ωiYiT

37

Page 40: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Results I: If there are no covariates (just L), O is sufficientlyrandom, and εit = Yit − Lit are iid with variance σ2.

Recall ‖Y‖F =√∑

i,t Y2it and ‖Y‖∞ = maxi,t |Yit|.

Let Y∗ be the matrix including all the missing values; e.g.,Y(0). Our estimate Y for Y∗ is L.

The estimated matrix Y is close to Y∗ in the following sense∗:

∥∥∥Y −Y∗∥∥∥F

‖Y∗‖F≤ C max

(σ,‖Y∗‖∞‖Y∗‖F

)rank(L)(N + T ) ln(N + T )

|O|.

Often the number of observed entries |O| is of order N × Tso if rank(L) min(N,T ) and ‖Y∗‖∞/‖Y∗‖F <∞, as N + Tgrows, the error goes to 0.∗Adapting the analysis of Negahban and Wainwright (2012)

38

Page 41: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Results II

To get confidence interval for Yit(1− Yit(0) (for treated unit

with Wit = 1), we need confidence interval for Lit and dis-

tributional assumption on εit = Yit(0) − Lit (e.g., normal,

N(0, σ2)).

• To estimate Lit consistently, and have distributional results,

we need N and T to be large (even when rank(L) = 1).

• We assume LN×T is a rank R matrix, R fixed as N , T

increase. (Can probably be relaxed to let R increase slowly.)

39

Page 42: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Large sample properties of Lit, following Bai (2003). De-

compose L as a rank R matrix:

LN×T = UN×RV′T×R

Define

ΣU =1

NU>U Ωi = U>i Σ−1

U σ2Σ−1U Ui

ΣV =1

TV>V Ψt = V>t Σ−1

V σ2Σ−1V Vt

Then√Ωi

N+

Ψt

T

−1 (Lit − Lit

)d−→ N (0,1)

40

Page 43: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Illustrations

• To assess root-mean-squared-error, not to get point esti-mate. We take a complete matrix Y, drop some entries andcompare imputed to actual values. We compare five estima-tors

• DID

• SC-ADH (Abadie-Diamond-Hainmueller)

• EN (Elastic Net, Doudchenko-Imbens)

• EN-T (Elastic Net Transposed, Doudchenko-Imbens)

• MC-NNM (Matrix Completion, Nuclear-Norm Min)

41

Page 44: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Illustration I California Smoking Example

Take Abadie-Diamond-Hainmueller California smoking data.

Consider two settings:

• Case 1: Simultaneous adoption

42

Page 45: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Illustration I California Smoking Example

Take Abadie-Diamond-Hainmueller California smoking data.

Consider two settings:

• Case 2: Staggered adoption

We report average RMSE for different ratios T0/T .

43

Page 46: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Illustration I California Smoking Example (N = 38, T = 31)

Simultaneous adoption, Nt = 8 Staggered adoption, Nt = 35

44

Page 47: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Illustrations II Stock Market Data

Daily returns on ≈ 2400 stocks, for ≈ 3000 days. We pick N

stocks at random, for first T periods. This is our sample.

We then pick bN/2c stocks at random from the sample, con-

sider the simultaneous adoption case with T0 in b0.25T c, b0.75T c,impute the missing data and compare to actual data.

We repeat this 5 times for two pairs of (N,T ): (N,T ) =

(1000,5) (thin) and (N,T ) = (5,1000) (fat).

45

Page 48: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Illustrations II Stock Market Data

Thin: (N,T ) = (1000,5) Fat: (N,T ) = (5,1000)

46

Page 49: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

References

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. ”Syn-

thetic control methods for comparative case studies: Esti-

mating the effect of Californias tobacco control program.”

Journal of the American statistical Association 105.490 (2010):

493-505.

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. ”Com-

parative politics and the synthetic control method.” Ameri-

can Journal of Political Science 59.2 (2015): 495-510.

Abadie, Alberto, and Jeremy L’Hour “A Penalized Synthetic

Control Estimator for Disaggregated Data”

Athey, Susan, Guido W. Imbens, and Stefan Wager. Efficient

inference of average treatment effects in high dimensions via

47

Page 50: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

approximate residual balancing. arXiv preprint arXiv:1604.07125v3

(2016).

Bai, Jushan. ”Inferential theory for factor models of large

dimensions.” Econometrica 71.1 (2003): 135-171.

Bai, Jushan. ”Panel data models with interactive fixed ef-

fects.” Econometrica 77.4 (2009): 1229-1279.

Bai, Jushan, and Serena Ng. ”Determining the number of

factors in approximate factor models.” Econometrica 70.1

(2002): 191-221.

Candes, Emmanuel J., and Yaniv Plan. ”Matrix completion

with noise.” Proceedings of the IEEE 98.6 (2010): 925-936.

Page 51: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Candes, Emmanuel J., and Benjamin Recht. ”Exact matrix

completion via convex optimization.” Foundations of Com-

putational mathematics 9.6 (2009): 717.

Chamberlain, G., and M. Rothschild. ”Arbitrage, factor struc-

ture, and mean-variance analysis on large asset markets. Econo-

metrica 51 12811304, 1983.

Chernozhukov, Victor, et al. ”Double machine learning for

treatment and causal parameters.” arXiv preprint arXiv:1608.00060

(2016).

Doudchenko, Nikolay, and Guido W. Imbens. Balancing, re-

gression, difference-in-differences and synthetic control meth-

ods: A synthesis. No. w22791. National Bureau of Eco-

nomic Research, 2016.

Page 52: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Gross, David. “Recovering Low-Rank Matrices From Few

Coefficients in Any Basis”, IEEE Transactions on Information

Theory, Volume: 57, Issue: 3, March 2011

Gobillon, Laurent, and Thierry Magnac. ”Regional policy

evaluation: Interactive fixed effects and synthetic controls.”

Review of Economics and Statistics 98.3 (2016): 535-551.

Keshavan, Raghunandan H., Andrea Montanari, and Sewoong

Oh. “Matrix Completion from a Few Entries.” IEEE Trans-

actions on Information Theory, vol. 56,no. 6, pp.2980-2998,

June 2010

Keshavan, Raghunandan H., Andrea Montanari. and Se-

woong Oh. “Matrix completion from noisy entries.” Journal

of Machine Learning Research 11.Jul (2010): 2057-2078.

Page 53: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Kim,D., AND T.Oka (2014):“Divorce Law Reforms and Di-

vorce Rates in the USA: An Interactive Fixed-Effects Ap-

proach, Journal of Applied Econometrics, 29, 231245 (2014).

Mazumder, Rahul and Hastie, Trevor and Tibshirani, Rob.

“Spectral Regularization Algorithms for Learning Large In-

complete Matrices”, Journal of Machine Learning, 11 2287-

2322 (2010).

Moon, Hyungsik Roger, and Martin Weidner. ”Linear regres-

sion for panel with unknown number of factors as interactive

fixed effects.” Econometrica 83.4 (2015): 1543-1579.

Liang, Dawen, et al. ”Modeling user exposure in recommen-

dation.” Proceedings of the 25th International Conference

on World Wide Web. International World Wide Web Confer-

ences Steering Committee, 2016.

Page 54: Matrix Completion Methods for Causal Panel Data Models · 2017. 7. 27. · II. How/why do we regularize: Potentially many parameters when (i) vertical regression on thin matrix, (ii)

Negahban, Sahand and Wainwright, Martin. “Estimation

of (near) low-rank matrices with noise and high-dimensional

scaling”, Annals of Statistics, Vol 39, Number 2, pp. 1069–

1097.

Negahban, Sahand and Wainwright, Martin. “Restricted

strong convexity and weighted matrix completion: Optimal

bounds with noise” Journal of Machine Learning Research,

13: 1665-1697, May 2012

Recht, Benjamin. “A Simpler Approach to Matrix Comple-

tion”, Journal of Machine Learning Research 12:3413-3430,

2011

Xu, Yiqing. ”Generalized Synthetic Control Method: Causal

Inference with Interactive Fixed Effects Models.” Political

Analysis 25.1 (2017): 57-76.