introduction to econometrics · introduction to econometrics lecture 2: review of basic statistics...
TRANSCRIPT
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Introduction to EconometricsLecture 2: Review of Basic Statistics & Randomized Controlled
Trials(RCTs)
Zhaopeng Qu
Business School,Nanjing University
Sep. 24th, 2020
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 1 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Outlines
1 An Brief Review of Basic Statistics
2 Review: Random Experiment as the Research Design
3 What is an RCT?
4 Assuming Case: the California School
5 Limitations of RCTs
6 An Example of Randomized Controlled Trials
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 2 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics
An Brief Review of Basic Statistics
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 3 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Basic Concepts
Population and Sample(总体与样本)
A population is a collection of people, items, or events aboutwhich you want to make inferences.
Population always have a probability distribution.A sample is a subset of population, which draw from populationin a certain way.
The sample could also follow a probability distribution.To represent the population well, a sample should be randomlycollected and adequately large.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 4 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Basic Concepts
Random Sample(随机样本) and i.i.d(独立同分布)
DefinitionThe r.v.s are called a random sample of size n from the populationf(x) if X1, ...,Xn are mutually independent and have the samep.d.f/p.m.f f(x). Alternatively, X1, ...,Xn are called independent,and identically distributed random variables(r.v.s) with p.d.f/p.m.f, commonly abbreviated to i.i.d. r.v.s.
eg. Random sample of n respondents in a survey.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 5 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Basic Concepts
Statistic(统计量) and Sampling Distribution
DefinitionX1, ...,Xn is a random sample of size n from the population f(x). Astatistic(T) is a real-valued or vector-valued function fully dependedon X1, ...,Xn, thus
T = T(X1, ...,Xn)
A statistic is only a function of the sample (统计量是样本的函数).The probability distribution of a statistic T is called thesampling distribution (抽样分布)of T.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 6 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Basic Concepts
Sample Mean(样本均值) and Sample Variance(样本方差)
Two common and important estimators
DefinitionThe sample average or sample mean, X, of the n observationsX1, ...,Xn is
X̄ = 1n(X1 + X2 + ...+ Xn) =1
n
n∑i=1
Xi
Accordingly, the sample variance is
S2 = 1n − 1
n∑i=1
(Xi − X)2
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 7 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Basic Concepts
Sample Mean(样本均值) and Sample Variance(样本方差)
As we know that if Xi is a random variable(r.v.), then f(Xi),which is a function of Xi, is also a r.v..(随机变量的函数还是随机变量)So if Xi is a r.v., then
∑Xi is also a r.v..
the sample mean and the sample variance are alsofunctions of sums, therefore they are a r.v.s too.there are some certain probability functions which candescribe distributions of the sample mean and the samplevariance.Then naturally ask a question: what are the expectation,variance or p.d.f./c.d.f. of these distributions?
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 8 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Basic Concepts
A simple case of sample meanLet {X1, ...,Xn} ∈ [1, 100] , assume n = 2, thus only X1and X2
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 9 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
Sampling Distributions
There are two approaches to characterizing samplingdistributions:
exact/finite sample distribution: The samplingdistribution that exactly describes the distribution of X forany n is called the exact/finite sample distribution of X.approximate/asymptotic distribution: when the samplesize n is large, the sample distribution approximates to acertain distribution function.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 10 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
Two Key Tools
Two key tools used to approximate sampling distributions whenthe sample size is large, thus assume that n → ∞
The Law of Large Numbers(L.L.N.): when the sample sizeis large, X will be close to µY , the population mean withvery high probability.The Central Limit Theorem(C.L.T.): when the samplesize is large, the sampling distribution of the standardizedsample average,(Y−µY)/σY , is approximately normal.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 11 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
Convergence in probability(概率收敛)
DefinitionLet X1, ...,Xn be an random variables or sequence, is said to convergein probability to a value b if for every ε > 0,
P(| Xn − b |> ε) → 0
as n → ∞. We denote this as Xnp−→ b or plim(Xn) = b.
It is similar to the concept of a limitation in a probability way.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 12 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
The Law of Large Numbers(大数定律)
TheoremLet X1, ...,Xn be an i.i.d draws from a distribution with mean µ andfinite variance σ2(a population) and X = 1n
∑ni=1 Xi is the sample
mean, thenX p−→ µ
Intuition: the distribution of Xn “collapses”on µ.直观解释:抽样样本量越大,样本平均值越接近总体平均值(抽样分布更紧凑)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 13 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
A simple case
ExampleSuppose X has a Bernoulli distribution if it have a binary valuesX ∈ {0, 1} and its probability mass function is
P(X = x) ={0.78 if x = 10.22 if x = 0
then E(X) = p = 0.78 and Var(X) = p(1− p) = 0.1716.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 14 / 106
-
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
Convergence in Distribution(分布收敛)DefinitionLet X1,X2,... be a sequence of r.v.s, and for n = 1, 2, .... Let Fn(x)be the c.d.f of Xn. Then it is said that X1,X2, ...converges indistribution to r.v. W with c.d.f, FW, if
limn�∞Fn(x) = FW(x)
which we write as Xn d−→ W.
Basically: when n is big, the distribution of Xn is very similar tothe distribution of W.Standardize: by subtracting its expectation and dividing by itsstandard deviation
Z = X − E[X]√Var[X]
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 16 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
The Central Limit Theorem(中心极限定理)TheoremLet X1, ...,Xn be an i.i.d draws from a distribution with sample size nwith mean µ and 0 < σ2 < ∞, then
Xn − µσ/√n
d∼ N(0, 1)
Because we don’t have to make any specific assumption aboutthe distribution of Xi, so whatever the distribution of Xi, when nis big,
the standardized Xn d∼ N(0, 1)or Xn d∼ N(µ, σ
2
n )
直观理解:选取的样本量越大,样本均值的分布越趋于正态分布。
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 17 / 106
-
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
-
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions
How large is “large enough”?
How large is large enough ?how large must n be for the distribution of Y to beapproximately normal?
The answer: it depends.if Yi are themselves normally distributed, then Y is exactlynormally distributed for all n.if Yi themselves have a distribution that is far from normal,then this approximation can require n = 30 or even more.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 20 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Statistical Inference: from Samples to Population
InferenceWhat is our best guess about some quantity of interest?What are a set of plausible values of the quantity of interest?
Our focus: {Y1,Y2, ...,Yn} are i.i.d. draws from f(y) or F(Y),thus population distribution.Statistical inference is using samples to infer f(y).
Normally, we don’t need to know everything of thepopulation, just some measures (the moment) enough todescribe the characteristics of the population.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 21 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Statistical Inference: Point estimation
Point estimation: providing a single “best guess”as to thevalue of some fixed, unknown quantity of interest, θ, which is is afeature of the population distribution, f(y).
µ =?E[Y]σ2 =?Var[Y]µy − µx? = E[Y]− E[X]
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 22 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Three Characteristics of an EstimatorLet µ̂Y denote the some estimation value of the populationmoment, µY and E(µ̂Y) is the mean of the sampling distributionof µ̂Y,
1 Unbiasedness: the estimator of µY is unbiased ifE(µ̂Y) = µY
2 Consistency:the estimator of µY is consistent ifµ̂Y
p−→ µY3 Efficiency:Let µ̃Y be another estimator of µY and suppose that
both µ̃Y and µ̂Y are unbiased.Then µ̂Y is said to be moreefficient than µ̂Y
var(µ̂Y) < var(µ̃Y)Comparing variances is difficult if we do not restrict ourattention to unbiased estimators because we could alwaysuse a trivial estimator with variance zero that is biased.Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 23 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Three Characteristics of an Estimator
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 24 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Three Characteristics of an Estimator
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 25 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Three Characteristics of an Estimator
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 26 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Properties of the Sample Mean
Let µY and σ2Y denote the mean and variance of Y,(总体的均值和方差)Let Y = 1n
∑ni=1 Yi is the sample mean of Yi(样本均值)
Then the expectation of the sample mean(样本均值的期望) is
E(Y) = 1n
n∑i=1
E(Yi) = µY
so Y is an unbiased estimator of µY.Based on the L.L.N., Y p−→ µY, so Y is also consistent.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 27 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Properties of the Sample Mean
The Variance of sample mean(样本均值的方差)
Var(Y) = var(1
n
n∑i=1
Yi
)=
1
n2n∑
i=1Var(Yi) =
σ2Yn
Then, the Standard Deviation of the sample mean is σY = σY√n
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 28 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Properties of the Sample Mean
Follow the C.L.T, the
Y d∼ N(µY,σ2Yn )
And let Z be the standardized Y, then
Z = Y − µYσY√n
d∼ N(0, 1)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 29 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Properties of the Sample Variance
Let µY and σ2Y denote the mean and variance of Yi , then thesample variance:
S2Y =1
n − 1
n∑i=1
(Yi − Y)2
Then it is easy to prove that1 E(S2Y) = σ2Y, thus S2 is an unbiased estimator of σ2Y which is
also the reason why the average uses the divisor n − 1instead of n.
2 S2YP−→ σ2Y, thus the sample variance is a consistent
estimator of the population variance.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 30 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
The Standard Error
Recall: the standardized sample mean will be approximatelyfollow a standard normal distribution when n is large.
Z = Y − µYσY√n
d∼ N(0, 1)
But in general σY, the standard deviation of population isunknown, so we have to use sample to estimate it.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 31 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
The Standard Error
Let σY = σY√n , because S2Y is an unbiased and consistentestimator of the σ2Y, then we can use SY√n as an estimator of thestandard deviation of the sample mean, σY.It is called the standard error(标准误)of the sample mean
SE[Y] = σ̂Y =SY√
n
Equivalence to the“standard deviation”(标准差)of the sampledistribution which measures the deviations of the sample mean.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 32 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Application: Sample Size and Standard Error(Population)
Population ∼ N(0, SD)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 33 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Sample Size and Standard Error(sample n=250)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 34 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Sample Size and Standard Error(n=500)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 35 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Sample Size and Standard Error(n=1000)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 36 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
Recall: The Chi-Square Distribution
Let Zi(i = 1, 2, ...,m) be independent random variables, eachdistributed as standard normal. Then a new random variablecan be defined as the sum of the squares of Zi :
X =m∑
i=1Z2i
Then X has a chi-squared distribution with m degrees offreedom.Then, it can be prove that a variation of the sample variance willfollow a Chi-Square distribution
(n − 1)S2Yσ2
∼ χ2n−1
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 37 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
The Student-t Distribution
The Student t distribution can be obtained from a standardnormal and a chi-square random variable.Let Z have a standard normal distribution, let X have achi-square distribution with n degrees of freedom and assumethat Z and X are independent. Then the random variable
T = Z√X/n
has has a t-distribution with n degrees of freedom, denoted asT ∼ tn.Then, the Z will follow a student t distribution.
Z = Y − µYSY√n
∼ t(n − 1)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 38 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing
The Student-t Distribution
It does not matter alot in the large sample.As the degrees offreedom get largewhich is highlycorrelated with thesample size n, thet-distribution actuallyapproaches thestandard normaldistribution.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 39 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
Interval Estimation
A point estimate provides no information about how close theestimate is likely to be to the population parameter.We cannot know how close an estimate for a particular sample isto the population parameter because the population is neverunknown.A different (complementary) approach to estimation is toproduce a range of values that will contain the truth with somefixed probability.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 40 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
What is a Confidence Interval?
DefinitionA 100(1− α)% confidence interval for a population parameter θ is aninterval Cn = (a, b) , where a = a(Y1, ...,Yn) and b = b(Y1, ...,Yn)are functions of the data such that
P(a < θ < b) = 1− α
In general, this confidence level is 1− α (置信水平) ; where αis called significance level.(显著性水平)The key is how to obtain or construct the values of a and b.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 41 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
Interval Estimation and Confidence IntervalsSuppose the population has a normal distribution N(µ, σ2) andlet Y1,Y2, ...,Yn be a random sample from the population.
Then the sample mean Y has a normal distribution:Y ∼ N(µ, σ2n )The standardized sample mean Z is given by:Z = Y−µσ/√n ∼ N(0, 1)
Then let θ = Z, then P(a < θ < b) = 1− α turns into
a < Y − µσ/√n
< b
then it follows thatP(Y − aσ/√n < µ < Y + bσ/√n) = 1− α
Thus the random interval contains the population mean with aprobability 1− α.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 42 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
Interval Estimation and Confidence IntervalsTwo cases: σ is known and unknownWhen σ is known, for example,σ = 1, thus Y ∼ N(µ, 1),
Y ∼ N(µ, σ2
n =1
n)
From this, we can standardize Y, and, because the standardizedversion of Y has a standard normal distribution, and we let α = 0.05,then we have
P(−1.96 < Y − µ1/√
n< 1.96) = 1− 0.05
The event in parentheses is identical to the eventY − 1.96/√n ≤ µ ≤ Y + 1.96/√n, so
P(Y − 1.96/√n ≤ µ ≤ Y + 1.96/√n) = 0.95The interval estimate of µ may be written as[Y − 1.96/√n,Y + 1.96/√n]
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 43 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
Interval Estimation and Confidence Intervals
When σ is unknown, we could use an estimate of σ,thusSE[Y] = σ̂Y = SY√n , the standard error, replacing unknown σthus
Zt =Y − µY
SY√n
=Y − µYSE(Y)
We just suggested that it follows a student t distribution.
Zt ∼ tn−1
DefinitionThe t-statistic or t-ratio:
Y − µYSE(Y)
∼ tn−1
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 44 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
Interval Estimation and Confidence Intervals
To construct a 95% confidence interval, let c denote the 97.5thpercentile in the tn−1 distribution.
P(−c < t ≤ c) = 0.95
where cα/2 is the critical value of the t distribution.The condence interval may be written as [Y ± cα/2S/√n]
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 45 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
Interval Estimation and Confidence Intervals
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 46 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals
A simple rule of thumb for a 95% confidence interval
Because as the degrees of freedom get large which is highlycorrelated with the sample size n, the t-distribution approachesthe standard normal distribution.And Φ(1.96) = 0.975, so a rule of thumb for an approximate95% confidence interval is
[Y ± 1.96× SE(Y)]
Or[Y ± 2× SE(Y)]
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 47 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Hypothesis TestingDefinitionA hypothesis is a statement about a population parameter, thus θ.Formally, we want to test whether is significantly different from acertain value µ0
H0 : θ = µ0which is called null hypothesis. The alternativehypothesis(two-sided) is
H1 : θ ̸= µ0
If the value µ0 does not lie within the calculated confidenceinterval, then we reject the null hypothesis.If the value µ0 lie within the calculated confidence interval, thenwe fail to reject the null hypothesis.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 48 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Introduction
In criminal law, institutions in most countries follow the rule:“innocent until proven guilty”(疑罪从无)
The prosecutor wants to prove their hypothesis that theaccused person is guilty.However, the burden is on the prosecutor to show guilt.The jury or judge starts with the “null hypothesis”thatthe accused person is innocent.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 49 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Introduction
In program evaluations,instead of “presumption of innocence,”the rule is: “presumption of insignificance”Policymaker’s hypothesis: the program improves learning.Evaluators approach experiments using the hypothesis:
there is zero impact of the programThen we test this “null hypothesis”
The burden of proof is on the programit should show a statistically significant impact.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 50 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Two Type Errors(两种错误)In both cases, there is a certain risk that our conclusion is wrong
Type I ErrorA Type I error is when we reject the null hypothesis when it is in facttrue.A Type II error is when we fail to reject the null hypothesis when it isfalse.
In criminal trialThe Type I : the judge reject the null hypothesis when thesuspect is actually no guilty.“宁可错杀一千,不能放过一个”
The Type II: the judge fail to reject the null hypothesiswhen the suspect is actually guilty.“宁可放过一千,不能错杀一个”
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 51 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
The Significance level(显著性水平)
DefinitionThe significance level or size of a test is the maximum probabilityfor the Type I Error
P(Type I error) = P(reject H0 | H0is true) = α
Usually, we has to carry the "burden of proof,"We would like to prove that the assertion of H1 is true byshowing that the data rejects H0.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 52 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Testing procedure
The following are the steps of the hypothesis testing:1 Specify H0 and H1.2 Choose the significance level α.3 Define a decision rule (critical value).4 Given the data compute the test statistic and see if it falls
into the critical region.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 53 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Decision Rule
The decision rule that leads us to reject or not to reject H0 isbased on a test statistic, which is a function of the data
Tn = T(Y1, ...,Yn)
Usually, one rejects H0 if the test statistic falls into a criticalregion(rejection region). A critical region is constructed bytaking into account the probability of making wrongdecisions,thus α.By convention, α is chosen to be a small number, for example,α = 0.01, 0.05, or 0.10
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 54 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
P-Value
To provide additional information, we could ask the question:What is the largest significance level at which we could carry outthe test and still fail to reject the null hypothesis?Or in other word, given the data, the smallest significance levelat which the null can be rejected.We can consider the p-value of a test
1 Calculate the t-statistic t2 The largest significance level at which we would fail to reject
H0 is the significance level associated with using t as ourcritical value
p − value = 1− Φ(t)where Φ(t) denotes the standard normal c.d.f.(we assumethat n is large enough)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 55 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
P-ValueSuppose that t = 1.52, then we can find the largest significance levelat which we would fail to reject H0
p − value = P(T > 1.52 | H0) = 1− Φ(1.52) = 0.065
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 56 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Hypothesis Test of of Ȳ
Specify H0 and H1
H0 : E[Y] = µY,0 H1 : E[Y] ̸= µY,0
Choose the significance level α and define a decision rule (criticalregion or critical value)
eg. if we choose α = 0.05, then the critical value is 1.96,then the region is (−∞,−1.96] and [1.96,+∞)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 57 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Hypothesis Test of of Ȳ
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 58 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)
Hypothesis Test of of Ȳ
Given the data compute the test statisticStep1: Compute the sample average ȲStep2: Compute the standard error of Ȳ
SE(Y) = sY√n
Step3: Compute the t-statistic
tact = Ȳ − µY,0SE(Ȳ)
Step4: Reject the null hypothesis if| tact |> critical valueor if p − value < significance level
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 59 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Review: Random Experiment as the Research Design
Review: Random Experiment as the Research Design
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 60 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Review: Random Experiment as the Research Design
Recall the last lecture
The Core of Empirical Studies: Causality v.s. ForecastingThe Central Question of Causality?
Rubin Causal Model: comparing counterfactuals or potentialoutcomes.However, we can never observe both counterfactuals —fundamental problem of causal inference.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 61 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Review: Random Experiment as the Research Design
Recall the last lecture
Random Assignment Solves the Selection Problem.We should treat experimental design as a Benchmark.To construct the counterfactuals, two broad categories ofempirical strategies.
Random Controlled Trials/Experiments:it can eliminates selection bias which is the mostimportant bias arises in empirical research. If we couldobserve the counterfactual directly, then there is noevaluation problem, just simply difference.
Program Evaluation Econometrics:The various approaches using naturally-occurring dataprovide alternative methods of constructing the propercounterfactual.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 62 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
What is an RCT?
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 63 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
Randomized Controlled Trials(RCTs)(随机可控试验)
In essence, an RCT is an experiment carried out on two ormore groups where participants are randomly assigned toreceive an intervention or not.
Participants are randomly assigned to either an treatmentgroup who are given the intervention, or a control groupwho are not..
In RCTs, each group is tested at the end of the trial and theresults from the groups are compared to see if the interventionhas made a difference and achieved its desired outcome. If therandomized groups are large enough, you can be confident thatdifferences observed are due to the intervention and not someother factor.RCTs are considered the gold standard for establishing a causallink between an intervention and change.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 64 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCT in History
First recorded RCTwas done in 1747 byJames Lind,who wasa Scottish physician inthe Royal Navy.Scurvy(败血症) is aterrible disease causedby Vitamin Cdeficiency. Seriousissue during long seavoyages.Lind took 12 sailorswith scurvy and splitthem into six groups oftwo.Groups were assigned:
(1) 1 qt cider(苹果酒) (2) 25 dropsof vitriol(硫酸)(3) 6 spoonfuls ofvinegar, (4) 1/2 ptof sea water, (5)garlic, mustard(芥末)and barleywater(大麦汤),(6) 2 oranges and1 lemon
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 65 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCT in History
Only Group 6 (citrus fruit) showed substantial improvement.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 66 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCT in History
Ronald A.Fisher(1890-1962),Britishstatistician and geneticist whopioneered the application ofstatistical procedures to the design ofscientific experiments.“a genius who almostsingle-handedly created thefoundations for modernstatistical science”.“Rothamsted Experimental
Station”
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 67 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCTs in Economics
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 68 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
Randomized Experimental Methods: Noble Prize 2019
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 69 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCTs in Social Policies
According to Baruch (1978), 245 randomized field experimentshad been conducted in U.S for social policies evaluations up to1978.The huge effort has been prompted by the 1% part of everysocial budget devoted to evaluation.Some of them were ambitious and very costly, and affecteddifferent kind of policies.
the Perry Preschool Program in 1961The Rand Health Insurance Experiment from 1974-1982.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 70 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
Education: the Perry Preschool Program
123 children born between 1958 and 1962 in MichiganHalf of them (drawn at random) entered the perry schoolprogram at 3 or 4 years old.Education by skilled professionals in nurseries and kindergarten.Program duration circle 30 weeksfollow-up survey (age : 14, 15, 19, 27 and 40 years old)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 71 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
Health Care: The Rand Health Insurance Experiment
5809 people randomly assigned in 1974 to different insuranceprograms with 0%, 25%, 50% and 75% sharing.They were followed until 1982.Main results : paying a portion of health cost make people giveup some “superfluous”cares, with little harm on their health.But some heterogeneity : not true for poor people.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 72 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCTs in China
“One egg a day”program in rural China by REAP at Stanford.One egg a day
“Free-lunch”program in primary schools at Western China.Free Lunch
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 73 / 106
https://reap.fsi.stanford.edu/research/egg_programs_and_nutritional_deficiencies_in_gansu_worth_the_efforthttp://www.mianfeiwucan.org/
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCT in Business
An interesting question: What is the optimal color for taxis?Ho, Chong and Xia(2017), Yellow taxis have fewer accidentsthan blue taxis because yellow is more visible than blue,PNAS.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 74 / 106
https://www.pnas.org/content/early/2017/02/28/1612551114https://www.pnas.org/content/early/2017/02/28/1612551114https://www.pnas.org/content/early/2017/02/28/1612551114
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCT in Business
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 75 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
RCT in Business
Another Critical Question for business: Is Working at Home isbetter than Working at Office?
Bloom, Liang, Roberts and Ying,(2015), “Does Working from HomeWork? Evidence from a Chinese Experiment”, The Quarterly Journalof Economics
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 76 / 106
https://academic.oup.com/qje/article-abstract/130/1/165/2337855?redirectedFrom=fulltexthttps://academic.oup.com/qje/article-abstract/130/1/165/2337855?redirectedFrom=fulltexthttps://academic.oup.com/qje/article-abstract/130/1/165/2337855?redirectedFrom=fulltext
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
What is an RCT?
Types of RCTs
Lab Experimentseg: students evolves a experiment in a classroom.eg: computer game for gamble in Lab
Field Experimentseg: the role of women in household’s decision or fakeresumes in job application
Quasi-Experiment or Natural Experiments: some unexpectedinstitutional change or natural shock
eg: Germany Reunion(德国统一), Great Famine inChina(1959-1961 年大饥荒)and U.S Bombing inVietnam(美国轰炸越南).
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 77 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School
Assuming Case: the California School
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 78 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School An assuming case: the California School
A Case: the California School
Draw schools (n = 420) randomly from all school in CaliforniaVariables:
5th grade test scores (Stanford-9 achievement test, combined mathand reading), district averageStudent-teacher ratio (STR) = no. of students in the district dividedby no. full-time equivalent teachers
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 79 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School An assuming case: the California School
Summary Table: Descriptive Statistics
Does this table tell us anything about the relationship between testscores and the STR?
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 80 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School An assuming case: the California School
Scatterplot: test score v. student-teacher ratio
What does this figure show? and it may suggest...?
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 81 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School An assuming case: the California School
The California Test Score
We need to get some numerical evidence on whether districts withlow STRs have higher test scores.But how?
1 Compare average test scores in districts with low STRs to those withhigh STRs (“estimation”)
2 Test the “null”hypothesis that the mean test scores in the two typesof districts are the same, against the “alternative”hypothesis thatthey differ (“hypothesis testing”)
3 Estimate an interval for the difference in the mean test scores, high v.low STR districts (“confidence interval”)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 82 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School An assuming case: the California School
The California Test Score
Compare districts with “small”and “large”class sizes:
Small v.s. LargeClass Size Average score(Y) Standard deviation N
Small(STR < 20) 657.4 19.4 238Large(STR ⩾ 20) 650.0 17.9 182
1 Estimation of ∆= difference between group means2 Test the hypothesis that ∆ = 03 Construct a confidence interval for ∆
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 83 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School Comparing Means from Different Populations
An Example: Comparing Means from Different Populations
In an RCT, we would like to estimate the average causal effectsover the population
ATE = ATT = E{Yi(1)− Yi(0)}
We only have random samples and random assignment totreatment, then what we can estimate instead
difference in mean = Ytreated − Ycontrol
Under randomization, difference-in-means is a good estimate forthe ATE.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 84 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School Comparing Means from Different Populations
Hypothesis Tests for the Difference Between Two Means
To illustrate a test for the difference between two means, let µwbe the mean hourly earning in the population of women recentlygraduated from college and let µm be the population mean forrecently graduated men.Then the null hypothesis and the two-sided alternativehypothesis are
H0 : µm = µwH1 : µm ̸= µw
Consider the null hypothesis that mean earnings for these twopopulations differ by a certain amount, say d0. The nullhypothesis that men and women in these populations have thesame mean earnings corresponds to H0 : H0 : d0 = µm − µw = 0
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 85 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School Comparing Means from Different Populations
The Difference Between Two MeansSuppose we have samples of nm men and nw women drawn atrandom from their populations. Let the sample average annualearnings be Ym for men and Yw for women. Then an estimatorof µm − µw is Ym − Yw .Let us discuss the distribution of Ym − Yw .
∼ N(µm − µw,σ2mnm
+σ2wnw
)
if σ2mand σ2w are known, then the this approximate normaldistribution can be used to compute p-values for the test of thenull hypothesis. In practice, however, these population variancesare typically unknown so they must be estimated.Thus the standard error of Ym − Yw is
SE(Ym − Yw) =
√s2mnm
+s2wnw
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 86 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School Comparing Means from Different Populations
The Difference Between Two Means
The t-statistic for testing the null hypothesis is constructedanalogously to the t-statistic for testing a hypothesis about asingle population mean, thus t-statistic for comparing two meansis
tact =Ym − Yw − d0SE(Ym − Yw)
If both nmand nm are large, then this t-statistic has a standardnormal distribution when the null hypothesis is true,thusYm − Yw = 0.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 87 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School Comparing Means from Different Populations
Confidence Intervals for the Difference Between TwoMeans
the 95% two-sided confidence interval for d consists of thosevalues of d within ±1.96 standard errors of Ym − Yw , thusd = µm − µw is
(Ym − Yw)± 1.96SE(Ym − Yw)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 88 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Assuming Case: the California School Comparing Means from Different Populations
Hypothesis Test of the Difference Between Two Means
Reject the null hypothesis if| tact |=| Ym−Yw−d0SE(Ym−Yw) |> critical valueor if p − value < significance level
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 89 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Limitations of RCTs
Limitations of RCTs
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 90 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Limitations of RCTs
RCT are far from perfect!
High Costs, Long DurationPotential Ethical Problems: “Parachutes reduce the risk ofinjury after gravitational challenge, but their effectiveness has notbeen proved with randomized controlled trials."
Milgram ExperimentStanford Prison ExperimentMonkey Experiment
Limited GeneralizabilityRCTs allow us to gain knowledge about causal effects butwithout knowing the mechanism.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 91 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Limitations of RCTs
Potential Problems in Practice
Small sample: Student EffectHawthorne effect(霍桑效应):The subjects are in an experimentcan change their behavior.Attrition(样本流失):It refers to subjects dropping out of thestudy after being randomly assigned to the treatment or controlgroup.Failure to randomize or failure to follow treatment protocol:People don’t always do what they are told.
Wearing glasses program in Western Rural China.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 92 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Limitations of RCTs
Program Evaluation Econometrics(项目评估计量经济学)Question: How to do empirical research scientifically when wecan not do experiments? It means that we always have selectionbias in our data, or in term of “endogeneity”.Answer: Build a reasonable counterfactual world by naturallyoccurring data to find a proper control group is the core ofeconometrical methods.Here you Furious Seven Weapons in Applied Econometrics(七种盖世武器)
1 Random Controlled Trials (RCT)(随机实验)2 OLS(最小二乘回归)3 Decomposition(分解)4 Instrumental Variable(工具变量)5 Differences in Differences(双差分)6 Matching and Propensity Score(匹配)7 Regression Discontinuity(断点回归)8 Synthetic Control (合成控制法)
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 93 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
An Example of Randomized Controlled Trials
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 94 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Working from Home(WFH) v.s Working from Office
“Does Working from Home Work? Evidence from a ChineseExperiment”,by Nicholas A. Bloom, James Liang, John Roberts,Zhichun Jenny Ying The Quarterly Journal ofEconomics,February 2015, Vol. 130, Issue 1, Pages 165-218.Basic Question: WFH = SFH
SFH(Shirking from Home)?
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 95 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Working from Home(WFH) is a trend internationally
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 96 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Motivations
Working from home is a modern management practice whichappears to be stochastically spreading in the US and Europe20 million people in US report working from home at least onceper weekLittle evidence on the effect of workplace flexibility
productivityemployee satisfactionshirking
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 97 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Ctrip Experiment
Ctrip, China’s largest travel-agent, with16,000 employees, $6bnNASDAQ.Co-founder of Ctrip, James Liang, was an Econ PhD atStanford and decided to run a experiment to test WFH.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 98 / 106
http://www.ctrip.comhttp://www.itb-china.com/speaker/mr-james-jianzhang-liang/
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Ctrip Experiment: A call center in ShanghaiThe experiment runs on airfare & hotel departments in Shanghai.Main Work: Employees take calls and make bookings.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 99 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
The Experimental Design
Treatment: work 4 shifts (days) a week at home and to workthe 5th shift in the office on a fixed day.Control: work in the office on all 5 days.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 100 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
The Experimental Design: Timeline
In early November 2010, employees in the airfare and hotelbooking departments were informed of the WFH program.Of the 994 employees in the airfare and hotel bookingdepartments, 503 (51%) volunteered for the experiment.Among the volunteers, 249 (50%) of the employees met theeligibility requirements and were recruited into the experiment.The treatment and control groups were then determined fromthis group of 249 employees through a public lottery.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 101 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
The Experimental Design
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 102 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Results: the number of receiving calls
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 103 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Results: Working hours
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 104 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Results:Many Outcomes
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 105 / 106
-
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
An Example of Randomized Controlled Trials
Conclusion: Very positive
They found a highly significant 13% increase in employeeperformance from WFH,
of which about 9% was from employees working moreminutes of their shift period (fewer breaks and sick days)and about 4% from higher performance per minute.
Home workers also reported substantially higher work satisfactionand psychological attitude scores, and their job attrition rates fellby over 50%.
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 106 / 106
An Brief Review of Basic Statistics Basic ConceptsLarge-Sample Approximations to Sampling DistributionsStatistical Inference: Estimation, Confident Intervals and Testing Interval Estimation and Confidence IntervalsHypothesis Testing(假设检验)
Review: Random Experiment as the Research DesignWhat is an RCT?Assuming Case: the California SchoolAn assuming case: the California SchoolComparing Means from Different Populations
Limitations of RCTsAn Example of Randomized Controlled Trials