module 1 of the course of econometrics …module 1 of the course of econometrics edoardo otranto...

Basics of Statisticsand Econometrics

Basic Probability Concepts

Random Variables

Bivariate Random Variables

Random Vectors

Functions of Random Variables

Some Specific Univariate . . .

General Concepts of . . .


Multivariate Normal Distribution

Tests Based on Likelihood . . .

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Basics of Statistics for EconometricsModule 1 of the Course of

Econometrics

Edoardo Otranto(Universita di Messina)

e-mail: [email protected]

International Doctoral Program inEconomics

Scuola Superiore Sant’Anna

November-December 2014



Random Variables


Random Vectors








Program

• Basic Probability Concepts

• Random Variables

• Bivariate Random Variables

• Random Vectors

• Functions of Random Variables

• Some specific Univariate Distribution

• General concepts of estimation theory

• General concepts of hypothesis testing

• Multivariate Normal Distribution

• Tests based on likelihood functionReference books:Mood-Graybill-Boes: Introduction to the Theory ofStatistics, McGraw-HillGreene: Econometric Analysis (Statistical appendices),Prentice Hall



Random Variables


Random Vectors








1. Basic Probability Concepts• An outcome is the result of an experiment or other

situation involving uncertainty.

• The set of all possible outcomes of a probabilityexperiment is called a sample space.

• An event is any collection of outcomes of anexperiment. Any event which consists of a singleoutcome in the sample space is called an elementaryor simple event. Events which consist of more thanone outcome are called compound events.

• Set theory is used to represent relationships amongevents.

• Two events are mutually exclusive (or disjoint) if itis impossible for them to occur together.

• A probability provides a quantitative description ofthe likely occurrence of a particular event. It ranges in[0, 1].



Random Variables


Random Vectors








Different definitions of probability

• Classical or a priori probability: if a randomexperiment can result in n mutually exclusive andequally likely outcomes and if nA of these outcomeshave an attribute A, then the probability of A is nA

n .

• A posteriori or frequency probability: a series ofobservations (or experiments) can be made underquite uniform conditions. The probability of the eventis the relative frequency with which the repeatedobservations satisfy the event.

• Subjective probability: it describes an individual’spersonal judgement about how likely a particularevent is to occur.



Random Variables


Random Vectors








Conditional ProbabilityIn many situations, once more information becomesavailable, we are able to revise our estimates for theprobability of further outcomes or events happening.Pr(A|B) denotes the probability that event A will occurgiven that event B has occurred already. A rule that canbe used to determine a conditional probability fromunconditional probabilities is:

Pr(A|B) =Pr(A ∩B)

Pr(B)



Random Variables


Random Vectors








Independent eventsTwo events, A and B, are independent if and only if theprobability that they both occur is equal to the product ofthe probabilities of the two individual events, i.e.

Pr(A ∩B) = Pr(A)Pr(B)

The idea of independence can be extended to more thantwo events. For example, A, B and C are independent if:

Pr(A ∩B ∩ C) = Pr(A)Pr(B)Pr(C)



Random Variables


Random Vectors








Multiplication ruleThe multiplication rule follows from the definition ofconditional probability.

Pr(A ∩B) = Pr(A|B)Pr(B) = Pr(B|A)Pr(A)

In general:

Pr(A1 ∩ A2 ∩ · · · ∩ An) =

Pr(A1)Pr(A2|A1)Pr(A3|A1 ∩ A2) . . . P r(An|A1 ∩ A2 ∩ . . . An−1)



Random Variables


Random Vectors








Addition rule

Pr(A ∪B) = Pr(A) + Pr(B)− Pr(A ∩B)

For mutually exclusive events:

Pr(A ∪B) = Pr(A) + Pr(B)

If A1, A2, . . . , An are events belonging to the samplespace, pairwise disjoint, then

Pr(A1 ∪ A2 ∪ · · · ∪ An) =

n∑i

Pr(Ai)



Random Variables


Random Vectors








2. Random VariablesWe view the observations as the outcome of a randomexperiment.The outcomes of the experiment are assigned uniquenumeric values. The assignment is one-to-one; eachoutcome gets one value, and no two distinct outcomesreceive the same value.We call the outcome variable X random variablebecause, until the experiment is performed, it is uncertainwhat value X will take.Probabilities Pr(X = x) are associated with outcomes toquantify this uncertainty.A random variable is discrete if the set of outcomes iseither finite in number or countably infinite.A random variable is continuous if the set of outcomes isinfinitely divisible and, hence, not countable.



Random Variables


Random Vectors








A probability distribution f (x) is a listing of the values xtaken by X and their associate probabilities.

For a discrete random variable (probability massfunction):

f (x) = Pr(X = x)

0 ≤ Pr(X = x) ≤ 1∑x

f (x) = 1



Random Variables


Random Vectors








For the continuous case, the probability associated with anyparticular point is zero; we can only assign positiveprobabilities to intervals in the range of X .

The probability density function (pdf) is defined so that:

f (x) ≥ 0

Pr(a ≤ x ≤ b) =

∫ b

a

f (x)dx ≥ 0∫ +∞

−∞f (x)dx = 1

Pr(a ≤ x ≤ b) = Pr(a ≤ x < b) =Pr(a < x ≤ b) = Pr(a < x < b)



Random Variables


Random Vectors








For a discrete random variable, the cumulative distributionfunction (cdf) F (x) is given by:

F (x) = Pr(X ≤ x) =∑X≤x

f (x)

Note that:f (xi) = F (xi)− F (xi−1)

For a continuous random variable:

F (x) =

∫ x

−∞f (t)dt

Note that:

f (x) =dF (x)

dx



Random Variables


Random Vectors








In both the continuous and discrete cases, F (x) must satisfythe following properties:

1. 0 ≤ F (x) ≤ 1

2. If x < y, F (x) ≤ F (y)

3. F (+∞) = 1

4. F (−∞) = 0

From the definition of the cdf:

Pr(a < x ≤ b) = F (b)− F (a)



Random Variables


Random Vectors








Expectations of a random variableThe expected value (mean) of a random variable is:

E(X) = µ =

∑

x xf (x) if x is discrete∫x xf (x)dx if x is continuous

If g(·) is a function of X , the expected value of g(x) is givenby:

E[g(X)] =

∑

i g(xi)f (xi) if x is discrete∫ +∞−∞ g(x)f (x)dx if x is continuous

If g(X) = a + bX for constants a and b, then:

E(a + bX) = a + bE(X)



Random Variables


Random Vectors








The variance of a random variable is:

V ar(X) = E[(X − µ)2] = σ2 =∑

x(x− µ)2f (x) if x is discrete∫x(x− µ)2f (x)dx if x is continuous

Another way to compute the variance is:

V ar(X) = E(X2)− µ2

If g(X) = a + bX for constants a and b, then:

V ar(a + bX) = b2V ar(X)

Note: the variance is positive; it is equal to zero for a constantrandom variable.Note: Mean and variance are not random variables.



Random Variables


Random Vectors








3. Bivariate Random VariablesThe joint probability function for two random variables, Xand Y , denoted f (x, y), is defined so that:

Pr(a ≤ X ≤ b, c ≤ Y ≤ d) =∑

a≤X≤b∑

c≤Y≤d f (x, y) if X and Y are discrete∫ ba

∫ dc f (x, y)dydx if X and Y are continuous

In the discrete case we call f (x, y) a joint probability massfunction, in the continuous case a joint probability densityfunction.



Random Variables


Random Vectors








Requirements:

f (x, y) ≥ 0

∑

x

∑y f (x, y) = 1 if X and Y are discrete∫

x

∫y f (x, y)dydx = 1 if X and Y are continuous



Random Variables


Random Vectors








The cumulative probability is likewise the probability of ajoint event:

F (x, y) = Pr(X ≤ x, Y ≤ y) =∑

X≤x∑

Y≤y f (x, y) if X and Y are discrete∫ x−∞∫ y−∞ f (t, s)dsdt if X and Y are continuous



Random Variables


Random Vectors








Marginal Distributions

A marginal probability density (marginal probabilitydistribution) is defined with respect to an individual variable.

fx(x) =

∑

y f (x, y) if Y is discrete∫y f (x, y)dy if Y is continuous

similarly for fy(y).



Random Variables


Random Vectors








Two random variables are statistically independent if andonly if their joint density is the product of the marginaldensities:

f (x, y) = fx(x)fy(y)

If (and only if) X and Y are independent, then the cdf factorsas well as the pdf:

F (x, y) = Fx(x)Fy(y)



Random Variables


Random Vectors








The means and variances of the variables in a joint distributionare defined with respect to the marginal distributions.

Discrete case

E(X) =∑x

xfx(x) =∑x

x

[∑y

f (x, y)

]=∑x

∑y

xf (x, y)

V ar(X) =∑x

[x−E(X)]2fx(x) =∑x

∑y

[x−E(X)]2f (x, y)

Continuous case

E(X) =

∫x

xfx(x)dx =

∫x

∫y

xf (x, y)dydx

V ar(X) =∫x[x− E(X)]2fx(x)dx =∫

x

∫y[x− E(X)]2f (x, y)dydx



Random Variables


Random Vectors








Covariance

Cov(X, Y ) = E[(X − µx)(Y − µy)] =

E(XY )− µxµy = σxy

If X and Y are independent:

σxy =∑

x

∑y(x− µx)(y − µy)f (x, y) =∑

x

∑y(x− µx)(y − µy)fx(x)fy(y) =∑

x(x− µx)fx(x)∑

y(y − µy)fy(y) =

E(X − µx)E(Y − µy) = 0

The sign of the covariance will indicate the direction ofcovariation of X and Y , but its magnitude depends on thescales of measurement. In view of this, a preferable measureis ...



Random Variables


Random Vectors








...the Correlation Coefficient

r(X, Y ) = ρxy =σxyσxσy

where σx and σy are the standard deviations of X and Y ,respectively.

ρxy has the same sign of σxy.

−1 ≤ ρxy ≤ 1

Independent variables are also uncorrelated. Variables that areuncorrelated are not necessarily independent. An importantexception is the joint Normal distribution.Note: functions of independent random variables areindependent; functions of uncorrelated random variablescould be correlated.



Random Variables


Random Vectors








Some general results

E(aX + bY + c) = aE(X) + bE(Y ) + c

V ar(aX + bY + c) =a2V ar(X) + b2V ar(Y ) + 2abCov(X, Y ) = V ar(aX + bY )

Cov(aX + bY, cX + dY ) =acV ar(X) + bdV ar(Y ) + (ad + bc)Cov(X, Y )

E(XY ) = E(X)E(Y ) + Cov(X, Y )

If X and Y are uncorrelated,

V ar(X + Y ) = V ar(X − Y ) = V ar(X) + V ar(Y )

E(XY ) = E(X)E(Y )

For any two functions, g1(X) and g2(Y ), if X and Y areindependent,

E[g1(X)g2(Y )] = E[g1(X)]E[g2(Y )]



Random Variables


Random Vectors








Conditioning in a bivariate distributionIn a bivariate distribution, there is a conditional distributionover Y for each value of X (and vice versa).Conditional densities:

f (y|x) =f (x, y)

fx(x)

f (x|y) =f (x, y)

fy(y)

If X and Y are independent:

f (y|x) = fy(y)

f (x|y) = fx(x)

Note that:

f (x, y) = f (y|x)fx(x) = f (x|y)fy(y)



Random Variables


Random Vectors








The conditional Mean

E(Y |X) =

∑

y yf (y|x) if Y is discrete∫y yf (y|x)dy if Y is continuous

The conditional Variance

V ar(Y |X) =

∑

y[y − E(Y |X)]2f (y|x) if Y is discrete∫y[y − E(Y |X)]2f (y|x)dy if Y is continuous

The computation can be simplified by using

V ar(Y |X) = E(Y 2|X)− [E(Y |X)]2



Random Variables


Random Vectors








4. Random VectorsThey are vectors whose elements are random variables.Let:

x = (X1, X2, . . . , Xn)′

The joint density is f (x), while the cdf is

F (x) =

∫ xn

−∞

∫ xn−1

−∞. . .

∫ x1

−∞f (x)dx1 . . . dxn−1dxn

The marginal distribution of any one (or more) of the nvariables is obtained by integrating (or summing) over theother variables



Random Variables


Random Vectors








The expected value of a random vector is the vector ofexpected values:

E(x) = µ =

E(X1)E(X2)

...E(Xn)

=

µ1

µ2...µn

The covariance matrix is given by:

Σ = E[(x− µ)(x− µ)′] =

σ21 σ12 . . . σ1n

σ21 σ22 . . . σ2n

... ... . . . ...σn1 σn2 . . . σ2

n

with σij = σji for each i 6= j (i, j = 1, . . . , n).Note that:

E[(x− µ)(x− µ)′] = E(xx′)− µµ′

If X1, X2, . . . , Xn are uncorrelated, Σ is a diagonal matrix; ifthey have also the same variance σ2, Σ is a scalar matrix(Σ = σ2In), where In is the identity n× n matrix.



Random Variables


Random Vectors








Linear transformation of a random vectorLet us consider the linear transformation a′x, where

a = (a1, a2, . . . , an)′

Then:

E(a′x) = a′µ

For the variance:

V ar(a′x) = E[a′x− E(a′x)]2 = E{a′[x− E(x)]}2 =

E[a′(x− µ)(x− µ)′a] = a′E[(x− µ)(x− µ)′]a =

a′Σa =∑

i

∑j aiajσij

This quadratic form is nonnegative and the symmetric matrixΣ must be nonnegative definite (positive semidefinite). It ispositive definite if and only if the components of x arelinearly independent.



Random Variables


Random Vectors








Set of linear transformationsLet us consider the set of linear functions y = Ax; thei− th element of y is

yi = aix

where ai is the i− th row ofA. Therefore:

E(yi) = aiµ

Collecting the results in a vector, we have:

E(Ax) = Aµ

For two row vectors:

Cov(aix,ajx) = aiΣaj′

Since aiΣaj′is the ij − th element ofAΣA′,

V ar(Ax) = AΣA′

This will be nonnegative definite or positive definite,depending on the column rank ofA.



Random Variables


Random Vectors








5. Functions of Random VariablesUnivariate case

Y = g(X)

Three types of transformation

1. discrete→ discrete. If the function is one to one, thenPr[Y = y(x)] = Pr[X = x]. If several values of Xyield the same value of Y , then Pr[Y = y] is the sum ofthe corresponding probabilities for X .

2. continuous→ discrete. Example: individual data onincome (X) are obtained in a survey, reportedcategorically. The transformed variable Y is the meanincome in the respective interval. Then

Pr(Y = µ1) = Pr(−∞ < X ≤ a)Pr(Y = µ2) = Pr(a < X ≤ b)...



Random Variables


Random Vectors








3. continuous→ continuous. Y = g(X) is a continuousmonotonic function of X . Using the change of variabletechnique to find the cdf of Y :

Pr(Y ≤ b) =

∫ b

−∞fx(g

−1(y))|g−1′(y)|dy

In practice:fy(y) = fx(g

−1(y))|g−1′(y)||g−1′(y)| is the Jacobian of the inverse transformation from Xto Y .The Jacobian must be nonzero for the density of Y to benonzero. If the Jacobian is zero, the function Y = g(X) isvertical, and hence all values of Y in the given range areassociated with the same value of X . This single point musthave probability zero.



Random Variables


Random Vectors








Bivariate caseLet X1 and X2 two random variables with joint distributionfx(x1, x2) and y1 and y2 two monotonic functions:

Y1 = y1(X1, X2)Y2 = y2(X1, X2)

Since the functions are monotonic, the inversetransformations exist:

X1 = x1(Y1, Y2)X2 = x2(Y1, Y2)

Then:

fy(y1, y2) = fx[x1(y1, y2), x2(y1, y2)]abs|J |

where J is the Jacobian of the transformations.J must be nonzero for the transformation to exist.A zero Jacobian implies that the two transformations arefunctionally dependent.This result can be extended to the n−variate case.



Random Variables


Random Vectors








6. Some Specific UnivariateDistributions

6.1. Discrete Uniform Distribution

f (x) =

{1N

for x = 1, 2, . . . , N0 otherwise

E(X) =N + 1

2

V ar(X) =N 2 − 1

12



Random Variables


Random Vectors








6.2. Continuous Uniform Distribution

f (x) =

{1b−a for x ∈ [a, b]0 otherwise

E(X) =a + b

2

V ar(X) =(b− a)2

12The sum of two (or more) Uniform random variables is notUniform distributed.The univariate distributions (discrete and continuous) are veryuseful when we need to generate random numbers.



Random Variables


Random Vectors








6.3. Normal Distribution

The general form of a normal distribution with mean µ andvariance σ2 (X ∼ N(µ, σ2)) is:

f (x|µ, σ2) =1

(2π)1/2σexp(−(x− µ)2

2σ2)

This pdf is a symmetric, bell-shaped curve, centered at itsexpected value µ.Many distributions arising in practice can be approximated bya Normal distribution. Other random variables may betransformed to normality.If X ∼ N(µ, σ2), then:

(a + bX) ∼ N(a + bµ, b2σ2)



Random Variables


Random Vectors








If a = −µ/σ and b = 1/σ, the resulting variable

z =X − µσ

has the standard normal distribution N(0, 1), with density:

f (z) =1

(2π)1/2exp(−z

2

2)

Moreover:

Pr(a < X < b) = Pr

(a− µσ

< Z <b− µσ

)The cdf of the standard normal distribution is indicated withΦ(·), the density with φ(·).

Φ(−z) = 1− Φ(z)

We can use the tables of the standard Normal distribution tocalculate each probability for every Normal distribution.

Pr(µ− σ < x < µ + σ) ∼= 23

Pr(µ− 2σ < x < µ + 2σ) ∼= 0.95



Random Variables


Random Vectors










Random Variables


Random Vectors








6.4. Distributions derived from the Normal:Chi-Squared

If Z ∼ N(0, 1) and X = Z2 ⇒ X ∼ χ21

χ21 indicates the chi-squared distribution with 1 degree of

freedom (dof).This is a skewed distribution with mean 1 and variance 2.

If X1, X2, . . . , Xn are n independent χ21 random variables

⇒∑n

i=1Xi ∼ χ2n

A Chi-squared with n dof has mean n and variance 2n.

If Z1, Z2, . . . , Zn are n independent N(0, 1) randomvariables⇒

∑ni=1Z

2i ∼ χ2

n

If X1 ∼ χ2n1

and X2 ∼ χ2n2⇒ (X1 + X2) ∼ χ2

n1+n2



Random Variables


Random Vectors










Random Variables


Random Vectors








6.5. Distributions derived from the Normal: Studentt

If Z ∼ N(0, 1) and X ∼ χ2n and is independent of Z , the

ratio:Z√X/n

∼ tn

follows a Student t distribution with n dof.

The t distribution has the same shape as the normaldistribution, but has ticker tails.

It has mean 0 (if n > 1) and variance nn−2 (if n > 2).

As n grows, the t distribution approaches the standardNormal distribution.



Random Variables


Random Vectors










Random Variables


Random Vectors








6.6. Distributions derived from the Normal: FisherF

If X1 ∼ χ2n1

and X2 ∼ χ2n2

are independent, the ratio:

X1/n1

X2/n2

∼ Fn1,n2

follows an F distribution with n1 numerator dof and n2

denominator dof.

Notice that:t2n = F1,n



Random Variables


Random Vectors










Random Variables


Random Vectors








6.7. The Bernoulli Distribution

f (x) =

{px(1− p)1−x for x = 0, 10 otherwise

where 0 ≤ p ≤ 1.

E(X) = p V ar(X) = p(1− p)

Bernoulli trial: a random experiment whose outcomes havebeen classified into two categories called success and failurerespectively.



Random Variables


Random Vectors








6.8. The Binomial Distribution

f (x) =

(nx

)px(1− p)n−x for x = 0, 1, . . . , n

0 otherwise

E(X) = np V ar(X) = np(1− p)

It represents a random experiment consisting of n repeatedindependent Bernoulli trials when p is the probability ofsuccess at each individual trial.



Random Variables


Random Vectors








7. General Concepts of EstimationTheory

7.1. Sampling

A sample of n observations on one or more variables, denotedx1,x2, . . . ,xn, is a random sample if the n observations aredrawn independently from the same population, or probabilitydistribution, f (Xi,θ). The vector θ contains one or moreunknown parameters.

A statistic is any function computed from the data in asample.It depends on n random variables, so that it is also a randomvariable with a probability distribution called samplingdistribution.

Sampling distribution of the sample mean:If X1, X2, . . . , Xn are a random sample from a populationwith mean µ and variance σ2, then X = 1/n

∑Xi is a

random variable with mean µ and variance σ2/n.



Random Variables


Random Vectors








7.2. Point Estimation

A point estimate is a statistic computed from a sample thatgives a single value for the unknown parameter θ.The standard error of the estimate is the standard deviationof the sampling distribution of the statistic; its square is thesampling variance.

An estimator is a rule or strategy for using the data toestimate the parameters. It is defined before the data aredrawn.Estimators are compared on the basis of a variety of attributes.We distinguish between finite sample properties andasymptotic sample properties.



Random Variables


Random Vectors








An estimator of a parameter θ is unbiased if the mean of itssampling distribution is θ.

E(θ) = θ

In other words, if samples of size n are drawn repeatedly andθ is computed for each one, the average value of theseestimates will tend to θ.

An unbiased estimator θ1 is more efficient than anotherunbiased estimator θ2 if

V ar(θ1) < V ar(θ2)

In the multivariate case the comparison is based on thecovariance matrices; in practice the difference between thecovariance matrix of θ2 and that of θ1 is a nonnegativedefinite matrix.



Random Variables


Random Vectors








7.3. Large-sample distribution theory

In most cases, whether an estimator is exactly unbiased, orwhat is exact sampling variance in samples of a given size,will be unknown. But we may be able to obtain approximateresults about the behavior of the distribution of an estimatoras the sample becomes large. Knowledge about the limitingbehavior of the distribution of an estimator can be used toinfer an approximate distribution for the estimator in a finitesample.

Let Xn be a random variable indexed by the size of a sample.

Xn converges in probability to a constant, c, iflimn→∞Pr(|Xn − c| > ε) = 0 for any positive ε.

We write plimXn = c

An estimator θ of a parameter θ is a consistent estimator ifand only if plimθ = θ.



Random Variables


Random Vectors








Weak Law of Large NumbersIt states that the sample mean converges in probabilitytowards the true expected value: plimXn = µ or

limn→∞Pr(|Xn − µ| > ε) = 0

Strong Law of Large Numbers

It states that the sample mean converges almost surely to theexpected value:

limn→∞Pr(|Xn − µ| < ε ∀m ≥ n) = 1 ∀ε > 0



Random Variables


Random Vectors








Xn converges in distribution to a random variable X withcdf F (x) if limn→∞|Fn(x)− F (x)| = 0 at all continuitypoints of F (x).

If Xn converges in distribution to X and F (x) is the cdf ofX , then F (x) is the limiting distribution of X .

This is written Xnd→ X

Central Limit Theorem (Lindberg-Levy): IfX1, . . . ,Xn

are a random sample from a multivariate distribution withfinite mean vector µ and finite positive definite covariancematrixQ and X = 1/n

∑iXi,

√n(Xn − µ)

d→ N(0,Q)



Random Variables


Random Vectors








An asymptotic distribution is a distribution that is used toapproximate the true finite sample distribution of a randomvariable.

An estimator θ of the parameter vector θ is asymptoticallynormal if: √

n(θ − θ)d→ N(0,V )

The estimator is asymptotically efficient if the variancematrix of any other consistent, asymptotically normallydistributed estimator exceeds (1/n)V by a nonnegativedefinite matrix.



Random Variables


Random Vectors








Likelihood FunctionIn a random set of n observations, the density of eachobservation is f (xi,θ). Since the n observations areindependent, the joint density is:

f (x1, x2, . . . , xn;θ) =n∏i=1

f (xi,θ) = L(θ|x1, x2, . . . , xn)

L(θ|x1, x2, . . . , xn) is the likelihood function for θ given thedata x1, x2, . . . , xn (we will abbreviate it to L(θ).

Let us call l(θ) the logarithm of the likelihood function.



Random Variables


Random Vectors








Cramer-Rao Lower BoundUnder certain regularity conditions,the variance of anunbiased estimator of a parameter θ will always be at least aslarge as

[n=(θ)]−1

=

(−E

[δ2l(θ)

δθ2

])−1=

(E

[(δl(θ)

δθ

)2])−1

n=(θ) is the information number for the sample.

Note: If one estimator attains the variance bound, there is notneed to consider any other in order to seek a more efficientestimator. If a given estimator does not attain the variancebound, we do not know whether this estimator is efficient ornot.



Random Variables


Random Vectors








7.4. Maximum Likelihood Estimator (MLE)

The principle of maximum likelihood provides a means ofchoosing an asymptotically efficient estimator for a parameteror a set of parameters.

It consists in maximizing the likelihood function with respectto the unknown parameters θ. It is usually simpler to workwith the log of the likelihood function:

l(θ) =n∑i=1

lnf (x1, x2, . . . , xn;θ)

because the logarithm is a monotonic function and the valuesthat maximize L(θ) are the same as those that maximize l(θ).



Random Variables


Random Vectors








Properties of MLEUnder some regularity conditions stated by f (x,θ), the MLEwill have the following asymptotic properties (the finitesample properties are usually unknown):

1. It is consistent

2. It is asymptotically normally distributed

θMLd→ N(θ,=(θ)−1/n)

3. It is asymptotically efficient and achieve the Cramer-Raolower bound for consistent estimators

Asy.V ar(θML) =(−E

[δ2l(θ)

δθθ′

])−1=(

E[(

δl(θ)

δθ

) (δl(θ)

δθ′

)])−1Finally, if g(θ) is a continuous function, its MLE is g(θML)(property of invariance).



Random Variables


Random Vectors








How to estimate the asymptotic variance of a MLE?Three alternatives:

[n=(θML)]−1 =

(− δ2l(θML)

δθMLδθ′ML

)−1

[nˆ=(θML)]−1 =

[n∑i=1

δlnf (xi,θML)

δθML

δlnf (xi,θML)

δθML

′]−1

[nˆ=(θML)]−1 =

{n[=(θML)][

ˆ=(θML)]−1[=(θML)]

}−1The last one is known as quasi-MLE.



Random Variables


Random Vectors








7.5. Interval Estimation

Regardless the properties of an estimator, the estimateobtained will vary from sample to sample, and there is someprobability that it will be quite erroneous.

The logic behind an interval estimate is that we use thesample data to construct an interval, [lower(θ), upper(θ)],such that we can expect this interval to contain the trueparameter in some specified proportion of samples (with somedesired level of confidence).

The theory of interval estimation is based on a pivotalquantity, which is a function of both the parameter and apoint estimate that has a known distribution.

In general, a confidence interval is given by:

Pr(θ − e1 < θ < θ + e2) = 1− α



Random Variables


Random Vectors








The theory does not prescribe exactly how to choose theendpoints for the confidence interval.An obvious criterion is to minimize the width of the interval.If the sampling distribution is symmetric, the symmetricinterval is the best one.If the sampling distribution is not symmetric, however, thisprocedure will not be optimal.

Example: confidence interval for the mean

Pr(x− tn−1,α/2s√n< µ < x + tn−1,α/2

s√n

) = 1− α

where tn−1,α/2 is the value of the Student t distribution withn− 1 dof, that is exceeded with probability α/2, s is thesample standard deviation and 1− α is the confidence level.



Random Variables


Random Vectors








8. General Concepts of HypothesisTesting

The classical testing procedures are based on constructing astatistic from a random sample that will enable the analyst todecide, with reasonable confidence, whether or not the data inthe sample would have been generated by a hypothesizedpopulation.

The formal procedure involves a statement of the hypothesis,usually in terms of a null hypothesis, H0, and an alternativehypothesis, H1.

The procedure itself is a rule, stated in terms of the data, thatdictates whether the null hypothesis should be rejected or not.

The classical methodology (Neyman-Pearson) involvespartitioning the sample space in two regions. If the observeddata (i.e., the test statistic) falls in the rejection region(critical region), the null hypothesis is rejected; if they fall inthe acceptance region, it is not.



Random Variables


Random Vectors








The test procedure can lead to different conclusions indifferent samples. As such, there are two ways such aprocedure can be in error.

• Type I error: The procedure may lead to rejection of thenull hypothesis when it is true.

• Type II error: The procedure may fail to reject the nullhypothesis when it is false.

The probability of a type I error is the size of the test. This isconventionally denoted α and is also called the significancelevel.The size of the test is under the control of the analyst.Reducing the size, the probability of II type error, β,increases.

The power of the test is the probability that it will correctlylead to rejection of a false null hypothesis (1− β).

For a given α, we would like β to be as small as possible.



Random Variables


Random Vectors








A test is most powerful if it has greater power than any othertest of the same size.

We might require that the test be uniformly most powerful(UMP), that is, have greater power than any other test of thesame size for all admissible values of the parameter.

There are few situations in which a UMP test is available. Acommon and very modest requirement is that the test inunbiased.

A test is unbiased if its power (1− β) is greater than or equalto its size α for all values of the parameter.

A test is consistent if its power goes to one as the sample sizegrows to infinity.



Random Variables


Random Vectors








9. Multivariate NormalDistribution

Bivariate case

f (x, y) =1

2πσxσy√

1− ρ2exp

[−ε2x + ε2y − 2ρεxεy

2(1− ρ2)

]where

εx =x− µxσx

εy =y − µyσy

The density is defined only if ρ is not 1 or−1 (the twovariables have not to be linearly related).



Random Variables


Random Vectors








Important results:If (X, Y ) ∼ N(µx, µy, σ

2x, σ

2y, ρ):

• The marginal distributions are Normal:

X ∼ N(µx, σ2x) Y ∼ N(µy, σ

2y)

• The conditional distributions are Normal:

Y |X ∼ N

[µy −

σxyσ2x

µx +σxyσ2x

x, σ2y(1− ρ2)

]and likewise for (X|Y );

• X and Y are independent if and only if ρ = 0. In thiscase f (x, y) = fx(x)fy(y)



Random Variables


Random Vectors








General case

f (x) = (2π)−n/2 |Σ|−1/2 exp[−1

2(x− µ)′Σ−1(x− µ)

]Warning: This definition requires Σ to be positive definite. Itis possible to define multivariate normal distributions alsowhen Σ is positive semi definite and singular, but thesedistributions do not admit an explicit pdf.

If all of the variables are uncorrelated, they are independent(f (x) =

∏ni=1 fxi(xi)).

If ρ = 0, µi = 0 and σi = 1 for each i = 1, . . . , n, we obtainthe multivariate standard Normal distribution (sphericalNormal distribution):

f (x) = (2π)−n/2exp(−x′x/2)



Random Variables


Random Vectors








9.1. Uncorrelation⇒ Independence

If x ∼ N(µ,Σ) and Σ is block-diagonal, thecorresponding sub-vectors of x are independent multivariatenormal vectors.Thus, uncorrelated sub-vectors are independent; moreover,each sub-vector has a marginal distribution multivariatenormal.

In particular, if Σ is diagonal, all the elements of x areindependent normal variables.

Warning: The above properties depend on the joint density ofthe elements of x being normal. If it is only known that themarginal densities of the elements are normal, then the jointdensity needs not be normal and may even not exist. Thus,uncorrelated normal variables need not be independent if theyare not jointly multivariate normal.



Random Variables


Random Vectors








9.2. Linear functions of a Normal vector (part 1)

If the (k × 1) random vector x ∼ N(µ,Σ) andA is a(k × k) non-singular constant matrix, the (k × 1) randomvector y = Ax has a multivariate normal distributiony ∼ N(Aµ,AΣA′).

Particular case: decomposing the positive definite matrixΣ = P ′P , with P square and non-singular, the lineartransformation z = P ′−1(x− µ) ∼ N(0, Ik) (vector ofindependent standard normal variables).



Random Variables


Random Vectors








9.3. Marginal distributions

If the (k × 1) random vector x ∼ N(µ,Σ), any sub-vectorof x has a marginal distribution multivariate normal, withmeans, variances and covariances obtained by taking thecorresponding elements of µ and Σ.

Particular case: any element of a multivariate normal vectorhas univariate normal distribution, whose mean and varianceare, respectively, the corresponding element of µ and thecorresponding diagonal element of Σ.



Random Variables


Random Vectors








9.4. Linear functions of a Normal vector (part 2)

If the (k × 1) random vector x ∼ N(µ,Σ) andD is a(p× k) constant matrix of rank p ≤ k, the (p× 1) randomvector y = Dx has a multivariate normal distributiony ∼ N(Dµ,DΣD′).

Warning. This result is a particular case of a more generalresult, that holds also when p > k or when the rank ofD isless than p. Any linear transformation of a multivariatenormal is a multivariate normal. This property, however,requires to deal also with multivariate normal distributionsthat do not admit an explicit pdf, due to singularity of thevariance-covariance matrix



Random Variables


Random Vectors








9.5. Conditional Distributions

The conditional distribution of x1|x2 is multivariate normal:

x1|x2 ∼ N [µ1 +Σ1,2Σ−12,2(x2−µ2), Σ1,1−Σ1,2Σ

−12,2Σ2,1].

Remark. The conditional mean of x1|x2 is a linear functionof x2; the conditional variance is independent of x2.



Random Variables


Random Vectors








10. Tests Based on LikelihoodFunction

H0 : c(θ) = q

Likelihood ratio testIf H0 is valid, imposing it should not lead to a large reductionin the log-likelihood function. Therefore, we base the test onthe difference between the value of the log-likelihoodfunction at the unconstrained value of θ, lU , and the value ofthe log-likelihood function at the restricted estimate, lR.

LR = −2(lR − lU)

LR follows a chi-squared distribution with dof equal to thenumber of restrictions imposed.We reject the null hypothesis if LR is significantly differentfrom zero.



Random Variables


Random Vectors








Wald testIf the restriction is valid, c(θML)− q should be close to zero,since the MLE is consistent.

W = (c(θ)− q)′(V ar[c(θ)− q])−1(c(θ)− q)

where V ar[c(θ)− q] = δc(θ)

δθ

′V ar(θ)δc(θ)

δθ.

Under H0, W has a chi-squared distribution with dof equal tothe number of restrictions.

We reject the null hypothesis if W is significantly differentfrom zero.



Random Variables


Random Vectors








Lagrange multiplier test (score test)If the restriction is valid, the restricted estimator should benear the point that maximizes the log likelihood. Therefore,the slope of the log-likelihood function should be near zero atthe restricted estimator. The test is based on the slope of thelog-likelihood at the point where the function is maximizedsubject to the restriction (θR).

LM =

(δl(θR)

δθR

)′[V ar(θR)]

(δl(θR)

δθR

)Under H0, LM has a chi-squared distribution with dof equalto the number of restrictions.

We reject the null hypothesis if LM is significantly differentfrom zero.



Random Variables


Random Vectors








Remarks:

• Using the LR test you estimate the model twice: oncewithout the restriction and once with the null hypothesisimposed. Using W you estimate the model only withoutrestrictions. Using LM estimate the model only with therestriction.

• The Wald test is generally applicable to any estimator thatis consistent and asymptotically Normal. The well-knownt and F tests are Wald tests.

• The LR test compare two alternative but nested models.

• The LR test is useful to test nonlinear restrictions and theresult is not sensitive to the way in which we formulatethese restrictions. In contrast the Wald test can handlenonlinear restrictions but it is sensitive to the way they areformulated. For example, it will matter whether we testθ = 1 or log(θ) = 0.

• The LM approach is particularly suited formisspecification tests where a chosen specification of themodel is tested for misspecification in several directions(heteroskedasticity, non normality, omitted variables).

module 1 of the course of econometrics …module 1 of the course of econometrics edoardo otranto...

Documents