2014-2015 applied statistics and econometrics notes …€¦ · applied statistics and econometrics...

146
Birkbeck, University of London Department of Economics, Mathematics and Statistics Graduate Diplomas in Economics and Finance; BSc FE, ESP, E&B, .... 2014-2015 Applied Statistics and Econometrics Notes and Exercises Ron Smith Email [email protected]

Upload: truongphuc

Post on 01-Sep-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Birkbeck, University of London

Department of Economics,Mathematics and Statistics

Graduate Diplomas inEconomics and Finance;

BSc FE, ESP, E&B, ....

2014-2015

Applied Statistics and Econometrics

Notes and Exercises

Ron Smith

Email [email protected]

Page 2: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

CONTENTSPART I: COURSE INFORMATION1. Aims, readings and approach2 Class Exercises3. Assessment and specimen exam questions4. How to do your projectPART II: NOTES5. Introduction6. Descriptive Statistics7. Economic and Financial Data I: Numbers8. Applied Exercise I: Ratios and descriptive statistics9. Index Numbers10. Probability11. Discrete Random Variables12. Continuous Random Variables13. Economic and Financial Data II: Interest and other rates14. Applied Exercise II: Sampling distributions15. Estimation16. Confidence Intervals and Hypothesis Tests for the mean17. Bivariate Least Squares Regression18. Matrix Algebra & Multiple Regression19. Properties of Least Squares estimates20. Regression Confidence Intervals and Tests21. Economic and Financial Data III: Relationships22. Applied Exercise III: Running regressions23. Dynamics24. Index

2

Page 3: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

1. PART I: Course Information

1.1. Aims

Economists have been described as people who are good with numbers but notcreative enough to be accountants. This course is designed to ensure that youare good with numbers; that you can interpret and analyse economic and finan-cial data and develop a critical awareness of some of the pitfalls in collecting,presenting and using data. Doing applied work involves a synthesis of variouselements. You must be clear about why you are doing it: what the purpose ofthe exercise is (e.g. forecasting, policy making, choosing a portfolio of stocks,answering a particular question or testing a hypothesis). You must understandthe characteristics of the data you are using and appreciate their weaknesses. Youmust use theory to provide a model of the process that may have generated thedata. You must know the statistical methods, which rely on probability theory,to summarise the data, e.g. in estimates. You must be able to use the software,e.g. spreadsheets, that will produce the estimates. You must be able to interpretthe statistics or estimates in terms of your original purpose and the theory. Thusduring this course we will be moving backwards and forwards between these ele-ments: purpose, data, theory and statistical methods. It may seem that we arejumping about, but you must learn to do all these different things together.Part I of this booklet provides background information: reading lists; details of

assessment (70% exam, 30% project) and instructions on how to do your project.Part II provides a set of notes. These include notes on the lectures, notes oneconomic and financial data, and applied exercises.Not all the material in this booklet will be covered explicitly in lectures, par-

ticularly the sections on economic and financial data. But you should be familiarwith that material. Lots of the worked examples are based on old exam ques-tions. Sections labelled background contain material that will not be on theexam. If you have questions about these sections raise them in lectures or classes.If you find any mistakes in this booklet please tell me. Future cohortsof students will thank you.

3

Page 4: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

1.2. Rough Lecture Outline

These topics roughly correspond to a lecture each, though in practice it may runa little faster or slower.AUTUMN week number7. Introduction8. Descriptive Statistics9. Index Numbers10. Probability11 Random VariablesSPRING week number1. The normal and related distributions2. Estimation3. Confidence Intervals and Hypothesis Tests4. Bivariate Least Squares Regression5. Matrix Algebra & Multiple Regression6. Reading week7. Properties of Least Squares estimates8 Tests for regressions9. Dynamics10. Applications11. RevisonTutorial Classes start on week 8 of the autumn term and week 2 of

the spring term, doing the exercises in section 2.

The sections in the notes on Economic and Financial Data and Applied Exer-cises, will be used for examples at various points in the lectures. You should workthrough them, where they come in the sequence in the notes. This material willbe useful for class exercises, exam questions and your project.

1.3. Learning Outcomes

Students will be able to demonstrate that they can:Explain how measures of economic and financial variables such as GDP, un-

employment and index numbers such as the CPI, RPI and FTSE are constructed,be aware of the limitations of the data and be able to calculate derived statisticsfrom the data, e.g. ratios, growth rates, real interest rates etc. Use a spreadsheetto graph data and calculate summary statistics and be able to interpret the graphsand summary statistics.

4

Page 5: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Use simple rules of probability involving joint, marginal and conditional prob-abilities, expected values and variances and use probabilities from the normaldistribution. Explain the basic principles of estimation and hypothesis testing.Derive the least squares estimator and show its properties using matrix algebra.Interpret regression output and conduct tests on coeffi cients.Be familiar with basicnon-linear and dynamic models.Read and understand articles using economic and financial data at the level of

the FT or Economist. Conduct and report on a piece of empirical research thatuses simple statistical techniques.

1.4. Your input

To achieve the learning outcomes (and pass the exams) requires a lot of indepen-dent work by you. We will assume that you know how to learn and that thereare things that we do not have to tell you because you can work them out or lookthem up for yourself. The only way to learn these techniques is by using them.

• Read these notes and any other material put on Moodle.

• Get familiar with economic and financial data by reading newspapers andmagasines such as the FT, The Economist, Sunday Business sections andthe like. The BBC Radio 4 series More or Less covers statistical aspects ofissues in the news.

• In looking at articles in the press note how they present Tables and Graphs;what data they use; how they combine the data with the analysis; how theystructure the article. You will need all these skills to do your project, solearn them by careful reading. Download Gapminder, its free, and see howit presents the data.

• Ensure that you can use a spreadsheet, such as Excel.

• Makes sure that you have mastered the mathematics. Help is available fromthe maths learning co-ordinator at: http://www.bbk.ac.uk/business/current-students/learning-co-ordinators/eva-szatmari/

• Try to attend all lectures and classes, if you have to miss them make surethat you know what they covered and get copies of notes from other students.

5

Page 6: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

• Do the exercises for the classes in advance. Continuously review the materialin lectures, classes and these notes, working in groups if you can. Do theMoodle quizzes.

• Identify gaps in your knowledge and take action to fill them, by askingquestions of lecturers or class teachers and by searching in text books. Emailquestions to us or email to fix appointments to see us.

• Do the applied exercise (section 8 of the notes) during the first term. Wewill assume that you have done it and base exam questions on it.

• Start work on your project early in the second term, advice on this is insection 4.

1.5. Reading

There are a large number of good text books on introductory statistics, but nonethat exactly match the structure of this course. This is because we cover in oneyear material that is usually spread over three years of an undergraduate degree:economic and financial data in the first year, statistics in the second year, andeconometrics in the third year. Use the index in any text book to find the topicscovered in this course. There is an index to these notes.If you have not done statistics before Michael Barrow Statistics for Economics,

Accounting and Business Studies is a good introduction. Use any edition. Thereare lots of other similar books.These notes cross-reference introductory statistics to Barrow (2013) and the

econometrics to Verbeek (2008), one of the books used on the MSc in Economicseconometrics course. There are a large number of other more basic econometricsbooks, such as Gujarati and Porter (2009) and Stock and Watson (2009).As background, a classic, which is still relevant, is How to lie with statis-

tics Huff (1954). The history of probability can be found in Bernstein (1996).Other general books on probablility are Gigerenzer (2002), and Taleb (2004, 2007).Where economic or financial topics appear in these notes, they are explained. Butit is useful to also do some general reading. On economics there are a range ofpaperbacks aimed at the general reader such as Kay (2004) and Smith (2003).Similarly, there are lots of paperbacks on finance aimed at the general reader.Mandelbrot and Hudson, (2005) is excellent. Mandelbrot a mathematician whoinvented fractals has done fundamental work on finance since the 1960s. Although

6

Page 7: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

he is highly critical of a lot of modern finance theory, he gives an excellent ex-position of it. Das (2006) provides an excellent non-technical introduction toderivatives, as well as a lot of funny and often obscene descriptions of what lifeis actually like in financial markets. MacKenzie (2006) provides a sociologicalanalysis of the development of finance theory and how the models shaped mar-kets. There are lots of books about the role of banking and finance in the crisisthat started in 2007.ReferencesBarrow, Michael, (2013) Statistics for Economics Accounting and Business

Studies, 6th edition, Pearson.Bernstein, Peter L. (1996) Against the Gods, the Remarkable Story of Risk,

Wiley.Das, Satyajit (2006) Traders Guns and Money, PearsonGigerenzer, Gerd (2002) Reckoning with Risk, Penguin.Gujarati D.N. and D.C. Porter, (2009) Basic Econometrics, 5th edition. Mc-

Graw Hill.Huff, Darrell (1954) How to lie with statistics, Penguin.Kay, John (2004) The Truth about Markets, PenguinMandelbrot, Benoit and Richard Hudson, (2005) The (Mis) Behaviour of Mar-

kets Profile Books.MacKenzie, Donald (2006) An engine, not a camera: how financial models

shape markets, MIT Press.Smith, David (2003) Free Lunch, Profile BooksStock, J.H. and M.W. Watson (2007) Introduction to Econometrics, 2nd edi-

tion, Pearson-Addison Wesley.Taleb, Nassim Nicholas (2004) Fooled by Randomness: the hidden role of

chance in life and in the markets, 2nd edition, ThomsonTaleb, Nassim Nicholas (2007) The Black Swan: The impact of the highly

improbable, Penguin.Verbeek, Marno (2008) A guide to modern econometrics, 3rd edition, Wiley.

7

Page 8: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

2. Class exercises

Many are past exam questions.AUTUMN

2.1. Week 8

Consider the following table.Gross Domestic Product, GDP, (billions of constant 1995 US$), Pop-

ulation (millions), Military Expenditure (billions of constant 1995 US$)and the number in the armed forces (thousands) in Developed and De-veloping Countries, 1985 and 1995.

1985 1995GDPRich 21190 23950Poor 4184 7010PopulationRich 1215.7 1151.5Poor 3620.8 4520.0Military ExpendituresRich 1100.8 667.8Poor 230.0 196.7Number in Armed ForcesRich 11920 7667Poor 16150 15120Source World Military Expenditures and Arms Transfers, US Arms

Control and Disarmament Agency. The Rich group includes 33 highper-capita income countries, the rest of the world Poor.

Gross Domestic Product is a measure of the output of a country. There arequite large margins of error on all these numbers and many economic measures areonly rough orders of magnitude. When doing calculations, you must be carefulabout units. These variables are all measured in different units and ratios willdepend on the units of the numerator and denominator. Expressing the unitsas powers of 10 is often useful. 0.1 = 10−1; 1 = 100; 10 = 101; 100 = 102;1, 000, 000 = 106. The power gives you the number of zeros after the one. Forratios 10a/10b = 10a−b.(a) Express the units of each of the four variables as powers of 10.

8

Page 9: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

From this table calculate:(b) Per-capita GDP (divide GDP by population) for rich and poor countries

in 1985 and 1995. What units is this measure in?(c) The growth rate of per-capita GDP 1985-1995 for rich and poor countries.

What units is this measure in?(d) The percentage share of military expenditure in GDP for rich and poor

countries in 1985 and 1995? Comment on the change.(e) The number of people in the armed forces per 1000 population for rich and

poor countries in 1985 and 1995.(f) Military expenditure per member of the armed forces for rich and poor

countries in 1985 and 1995.

2.2. Week 9 Summation 1

Consider the set of observations on an observed variable xi, i = 1, 2, ..., N and

wi = a+ bxi.

Define the mean of yi as

x =

N∑i=1

xi/N

and similarly for w. Show that each of the following three expressions are true:

N∑i=1

wi = Na+ bN∑i=1

xi;

w = a+ bxN∑i=1

(xi − x) = 0.

2.3. Week 10 Summation 2

Consider the set of observations on observed variables xi, and yi, i = 1, 2, ..., Nand define the variance, V (x), and standard deviation, S(x), of xi as

V (x) =∑

(xi − x)2/N ; S(x) =√V (x)

9

Page 10: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

and similarly for yi. Define the covariance between xi and yi as

Cov(x, y) =

N∑i=1

(yi − y)(xi − x)/N

and the correlation r between xi and yi as

r(x, y) =Cov(x, y)

S(X)S(Y ).

Letz(yi) =

yi − yS(y)

, z (xi) =xi − xS(x)

.

Show that each of the following four expressions are true

N∑i=1

z(yi)/N = 0

N∑i=1

[z(yi)]2/N = 1

r(x, y) =∑

z(yi)z(xi)

r(x, x) = 1

From this show that −1 ≤ r ≤ +1.

2.4. Week 11 Index Numbers

(1) UK GDP in current market prices in 1995 was £ 712,548m, while in 1997 itwas £ 801,972m. GDP at constant 1995 market prices in 1997 was £ 756,144m.(a) Construct index numbers, 1995=100 for: current price GDP; constant price

GDP; and the GDP deflator in 1997.(b) From these numbers calculate the average annual rate of inflation between

1995 and 1997.(c) From these numbers calculate the average annual rate of growth between

1995 and 1997.(d) If the interest rate on two year bonds in 1995 was 10% per annum what

would the real per annum interest rate over this period be.(e) Explain what Gross Domestic Product measures. What limitations does it

have as a measure of the economic wellbeing of a nation.

10

Page 11: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

SPRING

2.5. Week 2, Descriptive Statistics and Index Numbers

(1) In a speech,Why Banks failed the stress test, February 2009, Andrew Haldaneof the Bank of England provides the following summary statistics for the "goldenera" 1998-2007 and for a long period. Growth is annual percent GDP growth,inflation is annual percent change in the RPI and for both the long period is1857-2007. FTSE is the monthly percent change in the all share index and thelong period is 1693-2007.

Growth Inflation FTSE98-07 long 98-07 long 98-07 long

Mean 2.9 2.0 2.8 3.1 0.2 0.2SD 0.6 2.7 0.9 5.9 4.1 4.1Skew 0.2 -0.8 0.0 1.2 -0.8 2.6Excess Kurtosis -0.8 2.2 -0.3 3.0 3.8 62.3

(a) Explain how the mean; standard deviation, SD; coeffi cient of skewness andcoeffi cient of kurtosis are calculated.(b) What values for the coeffi cients of skewness and kurtosis would you ex-

pect from a normal distribution. Which of the series shows the least evidence ofnormality.(c) Haldane says "these distributions suggest that the Golden Era" distribu-

tions have a much smaller variance and slimmer tails" and "many risk managementmodels developed within the private sector during the golden decade were, in ef-fect, pre-programmed to induce disaster myopia.". Explain what he means usingthese statistics.

2.6. Week 3 Probability

(1) Show that the variance equals the mean of the squares minus the square ofthe mean:

N−1N∑i=1

(xi − x)2 = N−1N∑i=1

x2i − (x)2

where x =∑xi/N.

11

Page 12: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(2) Suppose you toss a fair coin three times in a row. What is the probabilityof:(a) three heads in a row;(b) a tail followed by two heads.(c) at least one tail in the three throws.Hint write out the 8 (23) possible outcomes and count how many are involved

in each case.

(3) Students take two exams A and B. 60% pass A, 80% pass B, 50% passboth.(a) Fill in the remaining five elements of the joint and marginal distributions

below, where PA indicates pass A, FB fail B, etc.(b) What is the probability of a student passing B given that they passed A?(c) Are the two events passing A and passing B (i) mutually exclusive (ii)

independent?PA FA B

PB 50 80FBA 60 100

2.7. Week 4 Probability Continued

(1) Consider the following game. A fair coin is tossed until it comes up heads andyou get paid £ (2n) if it comes up heads on the n-th throw. If it comes up headsthe first time you get £ 2 and the game stops. If it comes up heads, for the firsttime on the second throw you get £ 4=(2)2 and the game stops; and so on. Whatis the expected value of this game? How much would you personally pay to playit?

(2) Define P (A) as the probability of event A happening; P (B) the probabilityof event B happening; P (A ∩ B) the probability of both A and B happening;P (A∪B) the probability of either A or B happening; and P (A | B) the probabilityof A happening conditional on B already having happened.(a) What is P (A ∩B) if A and B are mutually exclusive.(b) What is P (A ∩B) if A and B are independent?(c) What is P (A ∪B)?(d) Show that

P (A | B) =P (B | A)P (A)

P (B).

12

Page 13: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(3) You are in a US quiz show. The host shows you three closed boxes in oneof which there is a prize. The host knows which box the prize is in, you do not.You choose a box. The host then opens another box, not the one you chose, andshows that it is empty. He can always do this. You can either stick with the boxyou originally chose or change to the other unopened box. What should you do:stick or change? What is the probability that the prize is in the other unopenedbox?

(4) (Optional). Calculate the probability that two people in a group of sizeN will have the same birthday. What size group do you need for there to be a50% chance that two people will have the same birthday? Ignore leap years.Use a spreadsheet for this and work it out in terms of the probability of not

having the same birthday. In the first row we are going to put values for N(the number of people in the group), in the second row we are going to put theprobability that no two people in a group of that size have the same birthday.In A1 put 1, in B1 put =A1+1, copy this to the right to Z1.In A2 put 1. Now in B2 we need to calculate the probability that two people

will NOT share the same birthday. There are 364 possible days, i.e. any day butthe first person’s birthday, so the probability is 364/365. So put in B2 =A2*(365-A1)/365. Copy this right. Go to C2, the formula will give you 1 × (364/365) ×(363/365). The third person, has to have birthdays that are different from thefirst and the second. Follow along until the probability of no two people havingthe same birthday falls below a half.

2.8. Week 5 Index numbers etc.

A firm buys two goods A and B in two years. The table below shows the priceand quantity of each purchased in year zero and year one.

Year 0 Year 1Good Pi0 Qi0 Pi1 Qi1

A 20 10 10 20B 10 20 20 10(a) What was total expenditure

∑i PitQit, in each year t = 0, 1, where

i = A,B?(b) What would have been its total expenditure in each year if in both

years it had bought (i) year zero quantities (ii) year one quantities?(c) Use the estimates in (b) to calculate two measure of inflation (i) using

year zero quantities (ii) using year one quantities.

13

Page 14: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) Comment on your results.(e) Consider the three indexes below. Explain which one is a price index,

which one a quantity index and which one is an expenditure index. Interpret therelationship between them.∑

i Pi1Qi1∑i Pi0Qi0

=

∑i Pi1Qi0∑i Pi0Qi0

×∑

i Pi1Qi1∑i Pi1Qi0

2.9. Week 6, Reading week

2.10. Week 7 Properties of estimators and distributions

(1) Suppose you have a sample of data, Yi, i = 1, 2, .., N, where Y ∼ IN(µ, σ2).(a) Explain what Y ∼ IN(µ, σ2) means.(b) How would you obtain unbiased estimates of µ and σ2? Explain what

unbiased means.(c) How would you estimate the standard error of your estimate of µ?(d) Suppose that the distribution of your sample was not normal but highly

skewed. Explain what this means and discuss what other measures of centraltendency that you might use.

(2) Marks on an exam are normally distributed with expected value 50 andstandard deviation 10.(a) What proportion of the students get(i) <30;(ii) between 30 and 50;(iii) over 50.(b) What mark does a student need to get into the top 15%.(c) In a class of 16, what is the probability that the average mark is greater

than 53?

2.11. Week 8 Hypothesis testing and Regression

(1) For a sample of data Y1, Y2, ..., YN on a random variable Y ∼ IN(µ, σ2).(a)You want to test the hypothesis that µ equals a specified value µo. How

would you test this hypothesis?(b) Explain Type I and Type II errors. How did you deal with these two types

of error in your answer to (a)?

14

Page 15: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(c) Suppose that you wanted to calculate the probability of observing a valueof Y greater than a specified value Y0. How would you do this?

(2) Consider the following bivariate regression model:

Yi = α + βXi + ui

estimated on a sample of data i = 1, 2, ..., N , where Yi is an observed dependentvariable, Xi is an observed exogenous regressor, ui is an unobserved disturbance,and α and β are unknown parameters.(a) Derive the least squares estimators for α and β.(b) Under what assumptions about ui will these least squares estimators be

Best Linear Unbiased.(c) Explain what Best Linear Unbiased means.(d) Explain what exogenous means.

2.12. Week 9, Regression

It is believed that an energy demand equation takes the form:

qt = α + βyt + γpt + εt,

where qt is the logarithm of per capita energy demand in year t; pt the logarithmof real energy prices; yt the logarithm of per-capita real GDP; εt is a well behaveddisturbance term. The following estimates (with standard errors in parentheses)were obtained using data for the period t =1974-1990.

β γ R2 SERIndia 1.006 −0.068 0.38 0.027

(0.102) (0.080)Indonesia 1.564 −0.488 0.52 0.034

(0.234) (0.195)Korea 1.074 −0.136 0.54 0.031

(0.125) (0.189)SER is the standard error of the regression.(a) How would you interpret β and γ?(b) Explain what R2 and SER are and what they tell you. How would you

interpret the fact that while Korea has the largest R2 it does not have the lowestSER?(c) For Indonesia, test (i) the hypothesis β = 1 and (ii) the hypothesis γ = 0.

15

Page 16: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) Interpret the stochastic component of the model. How would you estimateit?(e) Suppose that you believed that there was a trend increase in energy effi -

ciency in these countries. How would you adjust the model to allow for this.

2.13. Week 10, Regression

Consider the linear regression model

y = Xβ + u

where y is a T×1 vector of observations on a dependent variable,X a full rank T×kmatrix of observations on a set exogenous variables, β a k × 1 vector of unknowncoeffi cients, and u an unobserved disturbance with E(u) = 0 and E(uu′) = σ2I.

(a) Derive the least squares estimator β.(b) Derive the variance covariance matrix of β.(c) Show that β is unbiased.

2.14. Week 11, Regression.

Using US data on on company earnings, Et, and the dividends paid out to share-holders, Dt, t =1872-1986 the following results were obtained (standard errors inparentheses):

Dt = 0.011+ 0.088Et+ 0.863Dt−1 +u1t(0.009) (0.008) (0.019)

R2 = 0.998, SER = 0.074.

lnDt = −0.136+ 0.312 lnEt+ 0.656 lnDt−1 +u2t(0.015) (0.025) (0.029)

R2 = 0.993, SER = 0.085.

SER is the standard error of regression.(a) Test whether the intercepts in each equation are significantly different from

zero at the 5% level and interpret them. Do they have sensible values?(b) It is suggested that the linear equation is a better equation than the loga-

rithmic because it has a higher R2. Do you agree with this?(c) Interpret the role of the lagged dependent variable and calculate the long-

run effect of earnings on dividends in each case.(d) A test for second order serial correlation had a p value of 0.008 in the

linear model and 0.161 in the logarithmic model. Explain what second order

16

Page 17: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

serial correlation is and why it is a problem. Is it a problem in either of thesemodels?

2.15. Answers to selected Spring exercises

2.15.1. Week 3, question 1.

Note(xi − x)2 = x2i + x2 + 2xxi

and that x is a constant, does not vary with i, so∑x2 = Nx2

N−1N∑i=1

(xi − x)2 = N−1∑

x2i +N−1Nx2 −N−12x∑

xi

= N−1∑

x2i + x2 − 2x2

= N−1∑

x2i − x2

2.15.2. Week 4, question 1.

This is the St Petersburg paradox. The expected value is infinite, but few wouldpay an infinite amount of money to play it. The usual explanation is in termsof the diminishing marginal utility of money, which makes the expected utility ofthe game less than infinity..

2.15.3. Week 4, question 3.

Suppose you chose A to start with. Consider the two strategies, stick S, or changeC. If the prize is in A, the host can open either box B or C and show it is empty.You win with S, sticking with box A and lose with C, changing to the box thehost left unopened. If the prize is in B, the host has to open box C. You lose withS, win with C, because you have to change to box B, box C is open. If the prizeis in C, the host has to open box B. You lose with S, win with C because youchange to box C, box B is open. Changing is the optimal strategy since you win2 times out of three and the probability that the prize is in the other box is 2/3.It can also be shown by Bayes theorem. Let WA be the event that the prize

is in box A, etc. Let HA be the event that the host opens box A, etc. Supposeyou choose box A. The probability that you win the prize if you switch is the

17

Page 18: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

probability that the prize is in B and the host opened C plus the probability thatthe prize is in C and the host opened B

P (WB ∩HC) + P (WC ∩HB)

= P (WB)P (HC | WB) + P (WC)P (HB | WC)

=1

3× 1 +

1

3× 1 =

2

3

The second line folows from the definition of conditional probabilities. This for-mula might seem complicated, but it appears in a very popular work of teenagefiction: The curious incident of the dog in the night-time, by Mark Haddon, Ran-dom House, 2003.

2.15.4. Week 9.

(a) β is the income elasticity of demand for energy, γ is the price elasticity.

(b) R2 = 1 −∑ε2t/∑

(qt − q)2; SER =√∑

ε2t/T − k where k = 3 here.R2 measures the fit relative to the variance of the dependent variable, the SERjust measures the fit. The rankings would only necessarily be the same if all thedependent variables had the same variance.(e) Include a time trend

qt = α + βyt + γpt + δt+ εt,

2.15.5. Week 11

(a) Intercept of the linear equation has t ratio 0.011/0.009=1.22 not significantlydifferent from zero. Intercept in log equation has t ratio -0.136/0.015=-9.06 sig-nificantly different from zero. The intercepts measure different things in the twoequations. See (c) below.(b) No, the R2 cannot be compared because the dependent variables are dif-

ferent.(c) The lagged dependent variable captures the effect that firms smooth divi-

dends and only adjust them slowly in response to earnings changes. The long runrelations are

D =0.011

1− 0.863+

0.088

1− 0.863E

= 0.080 + 0.642E

18

Page 19: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

lnD =−0.136

1− 0.656+

0.312

1− 0.656lnE

= −0.395 + 0.91 lnE

In the case of the linear model the long-run intercept should be zero, dividendsshould be zero when earnings are zero, the answer in (a) indicates that the inter-cept is not significantly different from zero. In the logarithmic case it is a constantof proportionality exp(−0.395) = 0.67 so the long-run is

D = 0.67E0.91.

(d) Second order serial correlation is a relation between the residuals of theform

ut = ρ1ut−1 + ρ2ut−2 + et

It is a problem because it indicates the model is likely to be misspecified. Thelinear model p value indicates that there is serial correlation p<0.05, the logarith-mic model p value indicates that there is probably not serial correlation, p>0.05,at the 5% level.

3. Assessment and specimen exam.

3.1. Assessment

Assessment is 70% on the exam, 30% on the empirical project submitted in earlyMay. You should read the advice in section 4 on doing the project.The exam will have six questions. Section A has questions 1-3; Section B

questions 4-6. You must do three questions: at least one from each section andone other.Section A will contain questions covering:1. Interpretation of statistics associated with regression, e.g. R2, standard

error of regression, diagnostic tests like the Durbin-Watson statistics and theeffects of the failures of the various assumptions.2. Probability and distributions, e.g. being able to use the basic rules of

probability; given the mean and variance for a normal distribution, calculate theprobability of various events happening, etc.3. Economic and financial data, this will involve calculations, e.g. of index

numbers, growth rates, ratios, derived measures and some interpretation.Section B

19

Page 20: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

4. More technical regression question.5. Question using matrix algebra question.6. Final regression question.Some specimen questions with answers are given below.

3.2. Specimen Questions

SECTION A1. US data 1971-1999 were used to estimate a relationship between the rate

of interest Rt and the rate of inflation πt,

Rt = α + βπt + ut.

The least squares estimates (with standard errors in parenthesese) were

Rt = 6.37 +0.33πt +ut(0.66) (0.10)

.

The standard error of regression, s = 2.75, the coeffi cient of determination R2 =0.29, and the Durbin Watson Statistic DW = 0.63.(a) Explain what ut measures.(b) Explain what R2, Standard Error of the Regression and Durbin Watson

statistic measure and what they tell you about this regression.(c) Interpret the estimates of α and β. What do they tell you about the rela-

tionship between inflation and interest rates?(d) What assumptions are required for least squares to give good estimates.Answer(a) The estimated residual ut measures the difference between the actual and

predicted value of Rt.(b) R2 = 1 −

∑u2t/

∑(Rt − R)2 gives the proportion of the variation in the

dependent variable explained by the regression, 29% in this case so quite low.s =

√∑u2t/T − 2 is a measure of the average error, 2.75 percentage points in

this case, quite a large error in predicting the interest rate;DW =

∑∆u2t/

∑u2t is a test for serial correlation, it should be close to two,

so at 0.63 this regression suffers from severe positive serial correlation.(c) α is the value the interest rate would take if inflation were zero, interest

rates would be 6.37%; β is the effect of inflation on interest rates: a 1 percentagepoint increase in inflation raises interest rates by 0.33 percentage points.

20

Page 21: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) the regressors should be exogenous, uncorrelated with the errors, E(πtut) =0,the regressors should not be linearly dependent, the variance of πt not equal

zero, in the case of a single regressor.the disturbances should have: expected value (mean) zero, E(ut) = 0; be

serially uncorrelated, E(utut−s) = 0, s 6= 0; with constant variance, E (u2t ) = σ2.

2. Suppose that you have a machine for filling 1 kilogram bags of sugar. Themachine is set so that the weight of a bag should be a normally distributed randomvariable with an expected value of 1005g and a standard deviation of 2g. For arandom variable Z with a standard normal, the cumulative probabilities are:

z 0 0.33 0.5 1 1.5 2 2.5P (Z < z) 0.5 0.6293 0.6915 0.8413 0.9332 0.9772 0.9938

(a) Expain what P (Z < z) tells you.(b) What is the probability that a bag of sugar will weight (i) less than 1000g

(ii) between 1004g and 1006g?(c) You take a sample of 16 bags of sugar, which have a mean of 1004g.What

is (i) the standard error of the mean and (ii) the probability of getting a samplemean of 1004g or less?(d) On the basis of this estimate of the mean would you conclude that the

machine was still working correctly?(e) How would you investigate whether the variance of the machine might have

increased?Answer(a) The probability that a random variable Z takes a value less than a particular

value z.(b)(i) z=(1000-1005)/2=2.5. P(z<2.5)=1-P(z<2.5)=1-0.9938=0.0062=, 0.6%, roughly

one chance in 200(ii) z=(1004-1005)/2=0.5. P(-0.5<Z<0.5)=2x(0.6915-0.5)=0.383=38%.(c) (i) standard error of the mean is 2/

√16 = 2/4 = 0.5,

(ii) P (Z < 1004) : z=(1004-1005)/0.5=-2. P(z<-2)=1-P(z<2)=1-0.9772=0.0228,2.28%.(d) From c(ii) the probability of getting this value or less is 2.28%, which is a

small number, so it is probably not working correctly. This is not needed for theanswer but if you wanted to test at the 5% level the null hypothesis that it wasworking properly (µ = 1005), you would need to be careful whether the alternative

21

Page 22: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

was µ 6= 1005, in which case there would be 2.5% in each tail; or µ < 1005, inwhich case there would be 5% in the lower tail. Since the probability is less than2.5%, you would reject either on a two tailed or one tail test.(e) Estimate the sample variance from your sample of 16 s2 = (16−1)−1

∑(xi−

x)2, to check whether the variance seemed to have increased from 4. This is notneeded for the answer but you would use a variance ratio F test.

3. The following data were taken from Economic Trends, February 2004.

NDY RDY RPI CPI HP TBY2000 654, 649 654, 649 170.3 105.6 87.7 5.692001 700, 538 685, 263 173.3 106.9 95.1 3.872002 721, 044 696, 224 176.2 108.3 111.2 3.92

NDY is nominal household disposable income in current prices; RDY is realhousehold disposable income in constant 2000 prices; RPI is the retail price index,1987=100; CPI is the consumer price index, 1996=100; HP is a house-price index;TBY is the Treasury Bill Yield a short term interest rate expressed as percentper annum.(a) Calculate the price index (implicit deflator) for disposable income.(b) Calculate the rate of inflation for 2001-2 for disposable income, RPI, CPI,

and HP. Comment on the relation between them.(c) Suppose that you owned a house, whose price increased at the average rate,

and had a 100% mortgage paying the Treasury Bill Rate on the mortgage, whatwould be the real return on owning the house over 2001-2.(d) The RPI includes housing prices in its measure of inflation, the CPI does

not. Should house prices be included in inflation measures?(e) What other differences are there between the RPI and CPI?Answer(a) The disposable income deflator is PDY = 100×NDY/RDY

NDY RDY PDY2000 654, 649 654, 649 1002001 700, 538 685, 263 102.22002 721, 044 696, 224 103.6(b) Inflation, 2001-2:PDY=1.31% , RPI=1.67%; CPI=1.31; HP=16.92.Massive house price boom, RPI slightly higher than CPI or PDY.(c). Real return is capital gain on house prices, less interest cost, less rate of

inflation. Using CPI (others are acceptable)= 16.92-3.92-1.31=11.69%.

22

Page 23: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) Yes even if you own your own home and have paid off your mortgage thereis an implicit rental cost of home ownership and this will increase when houseprices increase.(e) CPI uses a geometric mean, RPI an arithmetic mean which means that

CPI tends to be about 0.5% lower.

SECTION B4. Using information in question 1, and assuming that the 95% critical value

for a t test is ±2 :(a) Test the hypothesis β = 1 at the 5% level and explain why this might be

an interesting hypothesis.(b) Explain what Type I and Type II errors are. What is the probability of

Type I error in your test in part (a).(c) Give a 95% confidence interval for β. Explain what a confidence interval

is.(d) What indicates that the e stimated equation is clearly misspecified? How

would you remedy this problem?(e) An alternative model is suggested:

Rt = α0 + α1Rt−1 + β0πt + β1πt−1 + ut.

How would you test whether the lagged values of Rt−1 and πt−1 are significant (i)individually and (ii) jointly?(f) Using this model in section (e) how would you estimate the long-run effect

of inflation on interest rates and test that this long-run effect is equal to one?Answer(a) t(β = 1) = (0.33− 1)/0.10 = −6.7 reject the hypothesis that β equals one.

If the real interest rate (It = Rt − πt) is constant plus a random error (FisherHypothesis) It = I + ut then Rt = I + πt + ut then in the regression α = I andβ = 1.(b) Type I error is rejecting the null when it is true, Type II error is accepting

the null when it is false. The probability of type I error in (a) is 5%.(c) The 95% confidence interval is 0.33± 2× 0.10 i.e. the range 0.13 to 0.53.

We are 95% confident that this range covers the true value of β.(d) The very low DW indicates massive serial correlation, respecify by includ-

ing lags as in the model in (e).(e) Individually: use the t statistics on the estimated coeffi cients α1 and β1.

Jointly: first take the restricted residual sum of squares from the equation in

23

Page 24: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

question 1, RRSS and the unrestricted residual sum of squares from the equationin question 4(e), URSS and calculate the F statistic

F (2, T − 4) =(RRSS − URSS)/2

URSS/(T − 4).

If this was greater than the relevant critical value reject the joint hypothesis thatthe coeffi cients of both the lagged values are zero.(f)The long run effect is measured as θ = (β0 + β1)/(1 − α1). If θ = 1, then

(1−α1) = (β0 + β1) or β0 + β1 +α1− 1 = 0.We can reparameterise the equationas

∆Rt = α0 + (α1 − 1)Rt−1 + β0∆πt + (β0 + β1)πt−1 + ut

∆Rt = α0 + (α1 − 1)(Rt−1 − πt−1) + β0∆πt + (β0 + β1 + α1 − 1)πt−1 + ut.

So if we estimate this and test that the coeffi cient on πt−1 equals zero, this is atest of a long-run unit effect. You could also do an F test using the restrictedmodel where πt−1 is excluded.

5. Consider the model:

yt = β1 + β2x2t + ut

t = 1, 2, ..., T. where yt are observations on a dependent variable and x2t observa-tions on a non-stochastic independent variable, ut is an unobserved disturbance,with E(ut) = 0; β1 and β2 are unknown parameters.(a) Set out the system of T equations in matrix form y = Xβ+u where y and

u are T × 1 vectors, X is a T × 2 matrix, and β is a 2× 1 vector.(b) How would you express the sum of squared residuals

∑Tt=1 u

2t in matrix

form. Show that this is a function of β.(c) Derive the least squares estimator of β, say β show that it is unbiased, i.e.

E(β) = β and derive its variance covariance matrix.(d) Define u = y − Xβ, show that X ′u = 0 and that u = My = Mu where

M = (I −X(X ′X)−1X).Answer(a). y1

..yT

=

1 x21.. ..1 x2T

[ β1β2

]+

u1..uT

24

Page 25: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(b). u′u = (y − Xβ)′(y − Xβ) = y′y + β′X ′Xβ − 2β′X ′y, which is clearly afunction of β.(c).

∂u′u

∂β= 2X ′Xβ − 2X ′y = 0

X ′Xβ = X ′y

β = (X ′X)−1X ′y = (X ′X)−1X ′(Xβ + u) = β + (X ′X)−1X ′u

E(β)

= β + (X ′X)−1X ′E(u) = β

since E(u) = 0 and X is non-stochastic.

V (β) = E(β − β)(β − β)′ = E((X ′X)−1X ′u)((X ′X)−1X ′u)′

= E((X ′X)−1X ′uu′X(X ′X)−1) = (X ′X)−1X ′E(uu′)X(X ′X)−1

= σ2(X ′X)−1X ′X(X ′X)−1 = σ2(X ′X)−1

(d)

X ′u = X ′(y −Xβ) = X ′(y −X(X ′X)−1X ′y)

= X ′y −X ′X(X ′X)−1X ′y = X ′y −X ′y = 0

u = (y −Xβ) = (y −X(X ′X)−1X ′y) = My

My = M(Xβ + u) = MXβ +Mu = Mu

MXβ = (I −X(X ′X)−1X ′)Xβ = Xβ −X(X ′X)−1X ′Xβ = 0

6. You have data on average earnings by age and education in the US in 1991for 30 groups of men, i = 1, 2, ..., N. Define wi the logarithm of earnings, Ai agein years (divided by 100), Ei years (divided by 100) of education. It is suggestedthat an appropriate model is:

wi = α + β1Ai + β2Ei + γ1A2i + γ2AiEi + ui.

The estimated coeffi cients, with their standard errors in parentheses are:wi = 7.46 +6.6Ai +9.0Ei −6.7A2i +4.1AiEi

(0.19) (0.62) (1.01) (0.56) (1.95)

25

Page 26: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

R2 = 0.99, SER = 0.05.(a) What is the stochastic component in this model? How would you interpret

it?(b) Explain the role of the A2i term?(c) Comment on these results. Are the signs what you would expect?(d) Given these estimates of the parameters at what age do men with zero

and twenty years of education earn the maximum amount? Are these sensiblenumbers?Answer(a) The stochastic component is ui it is the part of log earnings not explained

by the regressors, will reflect unmeasured ability and other factors influencingeartnings.(b) We would expect earnings to rise and then fall with age, the quadratic

terms captures this feature.(c) Yes. For earnings to rise and fall with age, we need β1 > 0 and γ1 < 0.

You earn more with better education so β2 > 0. The positive coeffi cient on theinteraction term γ2 makes peak earnings later for more highly educated men,which is likely.(d) To get the maximum

∂w

∂A= β1 + 2γ1A+ γ2E = 0

A = −(2γ1)−1(β1 + γ2E)

= 0.0746(6.6 + 4.1E)

= 0.49 + 0.31E

note Ai and Ei are divided by 100. For Ei = 0 max w =100x0.49=49 years. ForEi = 0.2 w=100x(0.49+0.31x0.2)=55 years. Although the idea of somebody withzero years of education is unlikely, these are plausible values for peak earning age.

3.3. 2014 Exam

3.3.1. SECTION A

1. Quarterly Colombian data for 1994Q2 to 2005Q3 on the logarithm of thenumber of guerrilla attacks, at, the amount of illegal drug production, dtand adummy variable for quarter 1, Q1t were used to estimate the following equation

26

Page 27: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

by least squares (standard errors in parentheses)

at = −5.2 +0.33 at−1 +0.66 dt−1 +0.27 Q1t +ut(2.30) (0.14) (0.20) (0.10)

R2 = 0.396, SER=0.268(a) Explain how the coeffi cient of determination, R2, and standard error of

regression, SER, are calculated and what they measure.(b) Which coeffi cients are significantly different from zero? Explain how you

would interpret this equation.(c) What assumptions are required for least squares to give good estimates?(d) A test for fourth order serial correlation gave a p value of 0.58 and a

test for heteroskedasticity gave a p value of 0.85. Do these p values indicatemisspecification?(e) Explain what fourth order serial correlation and heteroskedasticity are.Answer1(a). Coeffi cient of determination tells you the proportion of the variation in

the dependent variable that the equation explains

R2 = 1−∑u2t∑

(at − a)2

SER tells you the average size of the error (square root of the average squaredresidual);

s =√∑

u2t/(T − k)

where k is the number of regression coeffi cients estimated, here 4, and T thenumber of observations.1(b). All coeffi cients except the constant are significant. The interpretation is

that attacks are 27% higher in Q1, and higher drug production increases attackswith a lag of a quarter and there is partial adjustment. The long run effect ofdrug production on attacks is 0.33/(1-0.66)=0.97, so a one percent increase indrug production causes a 0.97 percent increase in attacks. Attacks are almostproportional to drug production in the long run.1(c). The errors have mean zero, constant variance and are not correlated

with each other; the regressors are either exogenous (independent of the errors)or predetermined (uncorrelated with the errors) and there are no exact lineardependences between the regressors.

27

Page 28: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

1(d). No evidence of misspecification at the 5% or 10% level since the p valuesare much bigger than 0.1, so we do not reject the null of no misspecification.1(e). Fourth order serial correlation means the errors in this quarter are related

to the errors 4 quarters ago (or up to 4 quarters ago). Heteroskedasticity meansthe variance of the errors is not constant.

2. Suppose the average real return on equities over 100 years was 5% and itsstandard deviation 10% and it appeared independently normally distributed. Fora Random Variable Z following a standard normal distribution

z 0.5 1 1.5 2 2.5 3P (Z < z) 0.6195 0.8413 0.9332 0.9772 0.9938 0.9987

(a) Explain what P (Z < z) means. What is P (Z < 0)?(b) Is the mean return over the 100 years significantly different from zero?(c) What is the probability of a negative return?(d) What is the probability of a negative return, three years running?(e) Explain how you would measure skewness and kurtosis, what values should

they take for a normal distribution.Answer2(a) P (Z < z) is the probability that a random variable Z takes a value less

than a specified value z. P (Z < 0) = 0.5.2(b) se (mean) =10/10= 1. z= 5/1=5. Reject Ho, mean return is significantly

different from zero.2(c) The probability of a negative return is z=(0-5)/10=-0.5. P(Z<-0.5)=P(Z>0.5)=1-

0.6915=0.3085.2(d) The probability of a negative return, three years running is (0.3085)^3=0.02932(e) To measure skewness and kurtosis standardise the data to say zt, by

subtracting mean and dividing by standard deviation. Coeffi cient of skewness is∑t z3t /T, should be zero for a normal distribution, coeffi cient of kurtosis,

∑t z4t /T

should be 3.

3. Consider the table below giving data on the annual rates of inflation of theConsumer Price Index, CPI and Retail Price Index RPI and the sources of thedifferences between them.Annual Rates of inflation, year to August, and sources of differences

CPI and RPI, percentages.

28

Page 29: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

difference due toCPI RPI CPI-RPI Housing formula other

2007 1.8 4.09 -2.29 -2.20 -0.57 0.482008 4.7 4.77 -0.07 -0.06 -0.49 0.492009 1.6 -1.29 2.89 3.06 -0.55 0.382010 3.1 4.68 -1.58 -0.72 -0.86 0.002011 4.5 5.71 -0.67 0.05 -0.99 0.272012 2.5 2.95 -0.45 -0.15 -0.88 0.582013 2.7 3.29 -0.59 -0.24 -0.94 0.60

(a) Explain the main differences between the GDP deflator, CPI and RPI asmeasures of inflation.(b) The Bank of England has a target range for annual CPI inflation. What

is the range and in which years were they outside this range?(c) Explain the main purposes for which the RPI is used.(d) Explain the source of the formula effect on the difference.(e) Comment on the average difference and the movement of the various com-

ponents of the difference over this period.Answer(a) The GDP deflator is the ratio of current price GDP to constant price GDP

and covers all goods in the economy, the CPI and RPI only cover consumer goods.The CPI excludes housing and uses a geometric formula, the RPI includes housingand uses an arithmetic formula. The RPI excludes expenditure by the very richand very poor.(b) The Bank of England target is 2% CPI inflation and range is 1% to 3%,

outside which the Governor has to write a letter to the Chancellor. On this datait was outside the range in 2008, 2010 and 2011.(c) The RPI is used to determine the amounts payable on index linked gilts,

and the basis of setting some wages.(d) In order to aggregate prices at the most basic level to form the elementary

aggregates the RPI uses the (Carli) arithmetic means of the price relatives (pricein year t to price in year zero) whereas the CPI uses the (Jevons) geometric mean.The Carli is

IC =

n∑j=1

(pjtpj0

)/n

29

Page 30: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

the Jevons is

IJ = n

√√√√ n∏i=1

(pjtpj0

)=

n

√∏pjt

n

√∏p0t

The arithmetic mean over-estimates relative to the geometric mean and the up-ward bias is greater when the variance of inflation is greater.(e) Over this period the average difference was -0.39, the RPI was roughly 0.4

percentage points higher, but the range was very large -2.29 to +2.89. The bulkof the volatility came from housing, particularly the house price boom in 2007and bust in 2009. There is a trend increase in the size of the formula effect whichwas assumed to be 0.5 percentage points at the time that the target changed fromRPIX to CPI.

3.3.2. SECTION B

4. Suppose the long run relationship between the desired consumption, C∗t , andincome, Yt, is C∗t = KY θ, and actual consumption is determined by Ct/Ct−1 =(C∗t /Ct−1)

λeut , where ut is a white noise error independent of income.(a) Explain how you would convert the equation into a form that could be

estimated by least squares.(b) Explain how you would calculate the long-run coeffi cient, θ, the speed of

adjustment, λ, and the constant of proportionality K.(c) Interpret θ and explain why one might expect it to take the value one.(d) Explain how you would test whether the estimate of θ, θ, is significantly

different from one.(e) Suppose people adjusted their hours of work to generate the income re-

quired to finance their desired consumption. What problems would this cause forleast squares estimation of this consumption equation?Answer4(a) Take logs to get

ct − ct−1 = λ(k + θyt − ct−1) + ut,

∆ct = β1 + β2yt + β3ct−1 + ut,

which is a linear relationship that can be estimated by least squares.4(b) Use λ = −β3; θ = −β2/β3; k = −β1/β3; K = exp(k).

30

Page 31: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

4(c) θ is the long-run elasticity of consumption to income, if the elasticity isone, the ratio of consumption to income is constant at K in the long run. Sincethe consumption income ratio is quite constant one might expect θ = 1.4(d).To test whether is significantly different from one, either impose the re-

striction to give∆ct = λk + λ(yt − ct−1) + ut

and use an F statistic to test this against the unrestricted model or reparameteriseequation as

∆ct = λk + λθ(yt − ct−1) + λ(θ − 1)ct−1 + ut

and use a t test on the coeffi cient of lagged consumption.4(e) If people adjusted their hours of work to generate the income required to

finance their desired consumption, this would make income is endogenous corre-lated with the error causing the least squares estimator to be inconsistent.

5. Consider the model: y = Xβ + uwhere y is a T × 1 vector of observations on the dependent variable; X is a

T ×k matrix of observations on exogenous variables, with rank k; β a k×1 vectorof unknown parameters; and u a T ×1 vector of unobserved errors with E(u) = 0,E(uu′) = σ2IT .(a) Define the estimated residuals as u = y −Xβ. Derive the estimator that

makes X ′u = 0.(b) Show that E(β) = β. Explain the role of the assumption that X is exoge-

nous.(c) Derive the variance-covariance matrix of β.(d) Explain how you would estimate σ2 and the standard errors of β.(e) Interpret the assumption E(uu′) = σ2IT .Answer(a) X ′(y −Xβ) = 0; X ′y −X ′Xβ = 0; X ′y = X ′Xβ; β = (X ′X)−1X ′y

(b) β = (X ′X)−1X ′y; β = (X ′X)−1X ′(Xβ + u);

β = (X ′X)−1X ′Xβ + (X ′X)−1X ′u; β = β + (X ′X)−1X ′u;

E( β) = β only if E((X ′X)−1X ′u) = E((X ′X)−1X ′)E(u) and for this weneed X and u to be independent, X to be exogenous. If so then since E(u) = 0,

E( β) = β.

(c) Since β is unbiased V ( β) = E( β−β)( β−β)′ = E((X ′X)−1X ′u)((X ′X)−1X ′u)′

V ( β) = E((X ′X)−1X ′u u′X(X ′X)−1) = (X ′X)−1X ′E(u u′)X(X ′X)−1

V ( β) = (X ′X)−1X ′(σ2I)X(X ′X)−1 = σ2 (X ′X)−1X ′X(X ′X)−1 = σ2 (X ′X)−1

31

Page 32: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) estimate σ2 by s2 = u′u/(T − k) and the standard error of βi is the squareroot of the ith diagonal of s2(X ′X)−1.(e) No heteroskedasticity or serial correlation.

6. Quarterly UK data for the change in the percentage unemployment rate,∆Ut , and the growth rate, at annual percent rates, Gt, were used to estimatethe following regressions. Standard errors are given in parentheses, R2 is thecoeffi cient of determination and SER the standard error of regression.For the period 1971Q4-1993Q1, 86 observations∆Ut = 0.102 −0.044 Gt +0.683 ∆Ut−1 +0.010 Gt−1 +et

(0.030) (0.009) (0.076) (0.010)R2 = 0.83, SER=0.128.For the period 1993Q2-2013Q1, 80 observations∆Ut = 0.080 −0.086 Gt +0.186 ∆Ut−1 +0.039 Gt−1 +et

(0.029) (0.020) (0.112) (0.020)R2 = 0.49, SER=0.163.

(a) Explain why you would expect these variables to be related and commenton the significance of the coeffi cients.(b) For each period calculate (i) the long-run effect of growth on the change

in unemployment and (ii) the growth rate at which unemployment is constant.(c) Comment on the main differences between the two periods, including the

speed of adjustment and the error variances.(d) A Chow breakpoint test for structural stability gave a test statistic of

F(4,158)=7.22, p value 0.0000. Explain the hypothesis tested and whether therelationship is stable.(e) Explain what the degrees of freedom of the F distribution, (4,158), measure

and why they take those values.Answer6. (a) The variables are related because of output grows the demand for

labour increases and unemployment falls, perhaps with a lag. In the first periodeverything is significant except lagged growth. In the second period the interceptand growth are significant the lagged change in unemployment is not and laggedgrowth is just on the edge of significance.(b) (i) Call long run effect in first period θ1 and second period θ2. θ1 =(-

0.044+0.01)/(1-0.683) =-0.107; second period θ2 =(-0.086+0.039)/(1-0.186) =-0.0577 (ii) Similarly for g∗1 = 0.102/0.043=2.37 second period g∗2 =0.080/0.047=1.7.

32

Page 33: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(c) The speed of adjustment 0.317= 1-0.683 was much slower in the first periodthan the second 0.814=1-0.186. Although the impact effect of growth was largerin the second period, the long run effect of growth was half as large in the secondperiod. The unexplained variance of unemployment was greater in the secondperiod 0.163 as compared to 0.128 and this is reflected in a lower R squared.(d) The null hypothesis is that the coeffi cients in the two periods are the same,

(conditional on the variances being the same, which may not be the case here).The null is rejected p value <0.05, so the relationship is not stable between thetwo periods.The first degree of freedom, 4 is the number of restrictions tested, that the

four coeffi cients are the same in each period, the second is the number of observa-tions minus the number of parameters estimated in the unrestricted model, 2× 4coeffi cients: 158=166-8.

33

Page 34: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

4. Doing Your Project

To do your project you need to choose a topic and collect some data; do somestatistical analysis (e.g. graphs, summary statistics); draw some conclusions andwrite up your results clearly in a standard academic style in less than 3,000 words.It often helps if the topic is posed as a question, which can be answered in theproject, as is done in sub-section 7.5. The project is designed to test your abilityto collect and interpret data, not a test of the material covered in this course, soyou do not need to do regressions or repeat text-book material in the project. Itdoes not have to be on a economic or financial topic. It will count for 30% of thetotal marks for the module.

4.1. BASIC RULES

Email a project project proposal to Ron Smith by March 3. Your proposalshould be less than 100 words. It should have (1) your name; (2) your course (3)an indicative title (4) a brief description of the data you are going to use and whereyou will get it and (5) a short indication of the questions you might investigate.You must check that the data are actually available, you cannot do your projectwithout data. You can change your topic subsequently if you run into diffi culties.Submit your project to the Department Offi ce by the deadline of May 12.

Refer to the College rules on late submission of work.You must submit a hard-copy version of the project, which must be typed,

securely stapled and have page numbers. Do not put it in a binder or folder.You must attach an electronic copy of the data (e.g. on disk, CD or USB

stick) in an envelope labelled with your name and course, firmly attached to yourreport (eg by stapling to the last page). The data must be in a form that allowsus to replicate what you have done, e.g. in a file from a standard program.Maximum length: 3,000 words (excluding graphs, tables, appendices, title page

and abstract, but including everything else).Do not exceed this upper limit: part of the exercise is to write up your results

briefly.We will not return your project, which we keep on file for writing references,

etc. Make a copy for your own use. We do not show projects to anyone but theexaminers. This is to allow students to use confidential data from work.Keep safe backup copies of your data and drafts of your text as you

work (college computers are a safe place). We are very unsympatheticif you lose work because it was not backed up properly.

34

Page 35: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

4.2. TITLE PAGE MUST CONTAIN

• Programme and year (eg GDE ASE Project 2015)

• Title of project

• Your name

• An abstract: maximum length 100 words

• The number of words in the project

• The programs you used.

4.3. YOU MUST

• Graph the data (line graphs, histograms or scatter diagrams)

• Number all graphs and tables, give them titles and specify units and source.

• Have a bibliography

• Detail the sources of the data and provide the data

• Be sure it is your own work. You can discuss it with friends or colleaguesand it is a good idea for you to read and comment on each others work butit must be your work which you submit. Plagiarism is a serious offence (seethe section in the course handbook).

4.4. HOW WE WILL ASSESS YOUR WORK

The criteria are listed below. We do not assign fixed weights to them, you cantrade them off, but be aware of diminishing returns.Writing. You will not get any credit unless we can understand what you have

done. We look for clear organisation of the piece as a whole; clear writing of in-dividual paragraphs and sentences; logical arguments and careful use of evidence.Check your spelling and grammar. We have set a short word limit so make everyword count.Scholarly conventions. Are sources of ideas and quotations properly ac-

knowledged? Is there a list of references? Are data sources properly documented?Is the project written in an academic (as opposed to, say, journalistic) style? If

35

Page 36: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

you do it on a topic related to your work, remember the style may have to differfrom that appropriate at work. Copy the styles of articles in economics journals.Originality/interest. Most topics can be made interesting if presented suffi -

ciently well, but it is harder to find something interesting to say about a standardtopic, than about a slightly more unusual topic. We can get bored reading over100 projects, try to make yours memorable.Analysis. Does your work indicate a good understanding of the relevant

context? Have you used appropriate concepts. This would include economic orfinance theory, if relevant. Can you develop a logical argument and use evidenceeffectively to support your argument? Did you answer the question you posed?Have you considered the direction of causality?Data collection/limitations. Have you collected appropriate data (given

time limitations)? Have you taken reasonable care to check the raw data andderived variables? Do you understand what your data actually measure? Are youaware of the limitations of your data? You will receive some credit for any unusualamount of work you have put into collecting data, indicate if the data collectionrequired a lot of time. Unless you have experience in designing surveys, do notconduct a survey to collect data.Data summary and presentation. Have you computed appropriate de-

rived variables? Have you noticed apparently anomalous observations? Do youdemonstrate the ability to summarize and present data in a clear and effectiveway?Statistical Methods. Have you used appropriate statistical methods? Use

the simplest technique that will answer your question. Have you qualified anyconclusions that you have drawn? e.g. pointed out that the sample size is smallor that you have been unable to control for certain factors, etc. Beware of usingadvanced statistical techniques that you do not understand; you will be penalisedfor any mistakes you make in their use.Interpretation. How well have you interpreted your data? Have you borne

its limitations in mind when interpreting it? Have you answered the question youposed?

4.5. CHOOSING A TOPIC

You can do your project on anything that involves interpreting data, it does nothave to be narrowly economic or financial. Projects predicting Olympic medalnumbers were popular in 2011-12. The topic may come from your work, issues

36

Page 37: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

that have come up in the programme, items in the news, or anything that interestsyou. Often choice of topic is prompted by the data available.

4.6. DATA

Barrow Ch. 9 discusses data. You must give us a copy of the data you haveused. If you need to use confidential work-related data, we can provide a letterto your employer explaining that it will be kept confidential. You should choosea topic on which you can find data without too much effort. If you cannot makesubstantial progress in finding data in 2-3 hours systematic search, either in thelibrary or over the internet, you should probably change your topic. There is avast amount of statistics available on the Web from governments, central banks,international organisations (IMF, OECD or World Bank). Also check BirkbeckeLibrary, statistical databases; datastream is available in the library. Try Googleor other search engines: just type the topic you are interested in and then data,e.g. “Road Traffi c Deaths Data”got various sites with international data on roadtraffi c deaths.Check your data, no matter where it comes from. Check for units, discontinu-

ities and changes in definitions of series, such as from the unification of Germany.Check derived variables as well as the raw data. Calculating the minimum, maxi-mum and mean can help to spot errors. Carry out checks again if you move datafrom one type of file to another.

4.7. WHAT YOUR REPORT SHOULD LOOK LIKE

Your project report should tell a story, with a beginning, a middle and an end. Itis a story about your investigation not part of your autobiography. The followingstructure is a suggestion, adapt it to suit your question. Look at the structureused in section 7.5, which describes UK growth and inflation.

4.7.1. ABSTRACT

Here you must summarize your project in 100 words or less. Many journals printabstracts at the start of each paper, copy their form

4.7.2. INTRODUCTION.

Explain what you are going to investigate, the question you are going to answer,and why it is interesting. Say briefly what sort of data you will be using (eg.

37

Page 38: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

quarterly UK time-series in section 7.5). Finish this section with a paragraphwhich explains the organization of the rest of your project.

4.7.3. BACKGROUND

Provide context for the analysis to follow, discuss any relevant literature, theoryor other background, explain specialist terms. Do not give standard textbookmaterial; you have to tell us about what we do not know, not what we do know.On some topics there is a large literature on others there will be very little. Thelibrary and a search engine, like Google Scholar, can help you to find literature.In many cases, this section will describe features of the market or industry you areanalyzing. In particular, if you are writing about the industry in which you work,you should make sure you explain features of the industry, or technical terms usedin it, which may be very well known to everyone in it, but not to outsiders.

4.7.4. DATA

Here you should aim to provide the reader with precise enough information aboutdefinitions and sources to follow the rest of the report. Further details can beprovided in an appendix. You should discuss any peculiarities of the data, ormeasurement diffi culties. You may need to discuss changes in the definition of avariable over time.

4.7.5. ANALYSIS

The background should guide you in suggesting features of the data to look at,hypotheses to test, questions to ask. You must have tables and graphs describingthe broad features of the data. In the case of time series data these features mightinclude trends, cycles, seasonal patterns and shifts in the mean or variance ofthe series. In the case of cross-section data they might include tables of meansand standard deviations, histograms or cross-tabulations. When interpreting thedata, do not to draw conclusions beyond those that are warranted by it. Oftenthe conclusions you can draw will be more tentative than you would like. Do notallow your emotional or ethical responses to cloud your interpretation of what youfind in the data.If you run regressions, report: the names of variables (including the depen-

dent variable); number of observations and definition of the sample; coeffi centsand either t-ratios, standard errors or p values; R-squared (or R-bar-squared);

38

Page 39: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

standard error of the regression; and any other appropriate test statistics such asDurbin-Watson for time-series.

4.7.6. SUMMARY AND CONCLUSIONS.

What are the main findings of your work: the answers to the questions you posedin the introduction? Howmust your findings be qualified because of the limitationsof the data or the methods of analysis you have employed? Do they have policyimplications (public or private)? Do you have suggestions for further work?

4.7.7. BIBLIOGRAPHY

You must give a bibliographic citation for any work referred to in the text, followthe Harvard system, used in section 1.5.

4.7.8. APPENDICES

Extra material, e.g. program output or detailed data definitions can be put inappendices. These do not count to the word total but the mark will be based onthe main text.

4.8. Good luck

You can learn a lot by doing your project. The skills you can develop in dataanalysis, interpretation and presentation are valuable in the labour market; andhaving a project to show a potential employer can be useful in getting a job.Doing your project can also be a very enjoyable experience. The more care youtake with it, the more you will learn, and the more fun you will have.

39

Page 40: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

5. PART II: NOTES

The word Statistics has at least three meanings. Firstly, it is the data themselves,e.g. the numbers that the Offi ce of National Statistics collects. Secondly, it has atechnical meaning as measures calculated from the data, e.g. an average. Thirdly,it is the academic subject which studies how we make inferences from the data.Descriptive statistics provide informative summaries (e.g. averages) or presen-

tations (e.g. graphs) of the data. We will consider this type of statistics first.Whether a particular summary of the data is useful or not depends on what youwant it for. You will have to judge the quality of the summary in terms of thepurpose for it is used, different summaries are useful for different purposes.Statistical inference starts from an explicit probability model of how the data

were generated. For instance, an empirical demand curve says quantity demandeddepends on income, price and random factors, which we model using probabilitytheory. The model often involves some unknown parameters, such as the priceelasticity of demand for a product. We then ask how to get an estimate of thisunknown parameter from a sample of observations on price charged and quantitysold of this product. There are usually lots of different ways to estimate theparameter and thus lots of different estimators: rules for calculating an estimatefrom the data. Some ways will tend to give good estimates some bad, so we needto study the properties of different estimators. Whether a particular estimator isgood or bad depends on the purpose.For instance, there are three commonmeasures (estimators) of the typical value

(central tendency) of a set of observations: the arithmetic mean or average; themedian, the value for which half the observations lie above and half below; and themode, the most commonly occurring value. These measure different aspects of thedistribution and are useful for different purposes. For many economic measures,like income, these measures can be very different. Be careful with averages. If wehave a group of 100 people, one of whom has had a leg amputated, the averagenumber of legs is 1.99. Thus 99 out of 100 people have an above average numberof legs. Notice, in this case the median and modal number of legs is two.We often want to know how dispersed the data are, the extent to which it

can differ from the typical value. A simple measure is the range, the differencebetween the maximum and minimum value, but this is very sensitive to extremevalues and we will consider other measures below.Sometimes we are interested in a single variable, e.g. height, and consider its

average in a group and how it varies in the group? This is univariate statistics,

40

Page 41: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

to do with one variable. Sometimes, we are interested in the association betweenvariables: how does weight vary with height? or how does quantity vary withprice? This is multivariate statistics, more than one variable is involved andthe most common models of association between variables are correlation andregression, covered below.A model is a simplified representation of reality. It may be a physical model,

like a model airplane. In economics, a famous physical model is the PhillipsMachine, now in the Science Museum, which represented the flow of nationalincome by water going through transparent pipes. Most economic models are justsets of equations. There are lots of possible models and we use theory (interpretedwidely to include institutional and historical information) and statistical methodsto help us choose the best model of the available data for our particular purpose.The theory also helps us interpret the estimates or other summary statistics thatwe calculate.Doing applied quantitative economics or finance, usually called econometrics,

thus involves a synthesis of various elements. We must be clear about why we aredoing it: the purpose of the exercise. We must understand the characteristics ofthe data and appreciate their weaknesses. We must use theory to provide a modelof the process that may have generated the data. We must know the statisticalmethods which can be used to summarise the data, e.g. in estimates. We mustbe able to use the computer software that helps us calculate the summaries. Wemust be able to interpret the summaries in terms of our original purpose and thetheory.

5.1. Example: the purpose of AA guns

The booklet contains a lot of examples, a number of which are not from economicsor finance, because the issues are often simpler in other areas. This exampleis to illustrate the importance of interpreting statistical summaries in terms ofpurpose. At the beginning of World War II, Britain fitted some merchant shipswith anti-aircraft (AA) guns. A subsequent statistical analysis showed that noGerman planes had ever been hit by merchant AA guns and it was decided toremove them. However, before this was done another statistical analysis showedthat almost none of the AA equipped ships had been hit by bombs from Germanaircraft, whereas large numbers of those without AA had been hit. This was therelevant statistic and the AA guns were kept on merchant ships. Although theguns did not hit the bombers, but they kept the bombers further away from the

41

Page 42: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

ships, reducing the probability of them damaging the ships. Other examples ofthis sort of use of statistics in World War II can be found in The Pleasures ofCounting, T.W. Korner, Cambridge University Press, 1996.

5.2. Example: the Effi cient Market model.

A simple and very powerful model in economics and finance is the random walk

yt = yt−1 + εt.

This says that the value a variable takes today, time t, is the value that it hadyesterday, time t − 1, plus a random shock, εt. The shock can be positive ornegative, averages zero and cannot be predicted in advance. Such random shocksare often called ‘white noise’. To a first approximation, this is a very good de-scription of the logarithm of many asset prices such as stock market prices andforeign exchange rates, because markets are quite effi cient: the change in log price(the growth rate) ∆yt = yt − yt−1 = εt is random, unpredictable. Suppose thatpeople knew something that will raise the price of a stock tomorrow, they wouldbuy today and that will raise the price of the stock today. Any information aboutthe future that can be predicted will be reflected in the price of the stock now.So your best estimate of tomorrow’s price is today’s price. What will move theprice of the stock will be new, unpredicted, information represented by εt. Mostof our models will involve random shocks like εt. Sometimes a firm will reporta large loss and its stock price will go up. This is because the market had beenexpecting even worse losses, which had been reflected in the price. When reportedlosses were not as bad as expected the price goes up. Whether the effi cient markethypothesis is strictly true is a subject of controversy, but the random walk modelis an illuminating first approximation.If the variable has a trend, this can be allowed for in a random walk with drift

yt = α + yt−1 + εt.

Then the variable increases on average by α every period. If the variable is alogarithm, α is the average growth rate. This is a parameter of the model, whichwe will want to estimate from data on yt. Parameters like α and random errors orshocks like εt will play a big role in our analysis.

42

Page 43: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

5.3. Notation

It is very convenient to express models in mathematical notation, but notationis not consistent between books and the same symbols means different thingsin different disciplines. For instance, Y often denotes the dependent variablebut since it is the standard economic symbol for income, it often appears as anindependent variable. It is common to use lower case letters to indicate deviationsfrom the mean, but it is also common to use lower case letters to denote logarithms.Thus yt could indicate Yt − Y or it could indicate ln(Yt). The logarithm may bewritten ln(Yt) or log(Yt), but in empirical work natural logarithms, to the basee, are almost always used. The number of observations in a sample is sometimesdenoted T for time series and sometimes N or n for cross sections.In statistics we often assume that there is some true unobserved parameter

and wish to use data to obtain an estimate of it. Thus we need to distinguish thetrue parameter from the estimate. This is commonly done in two ways. The trueparameter, say the standard deviation, is denoted by a Greek letter, say σ, andthe estimate is denoted either by putting a hat over it, σ, said ‘sigma hat’or byusing the equivalent latin letter, s. In many cases we have more than one possibleestimator (a formula for generating an estimate from the sample) and we haveto distinguish them. This is the case with the standard deviation, there are twoformulae for calulating it, denoted in these notes by σ and s. However, books arenot consistent about which symbol they use for which formula, so you have to becareful. Since we use it a lot, the Greek alphabet is given below.

5.4. The Greek alphabet.

A α alpha; α often used for intercept in regression and a measure of performancein finance.

B β beta; β often used for regression coeffi cients and a measure of the risk ofa stock in finance.

Γ γ gamma.∆ δ delta; used for changes, ∆yt = yt − yt−1; δ often rate of depreciation.E ε or ε epsilon; ε often error term.Z ζ zeta.H η eta; η often elasticity.Θ θ theta; Θ sometimes parameter space; θ often a general parameter.I ι iota.K κ kappa.

43

Page 44: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Λ λ lambda; λ often a speed of adjustment.M µ mu; µ often denotes expected value or mean.N ν nu.Ξ ξ xi.O o omicron.Π π pi; (ratio of circumference to diameter) often used for inflation. Π is the

product symbol:∏

yi = y1 × y2 × ...× yn .P ρ rho; ρ often denotes autocorrelation coeffi cient.Σ σ sigma; σ2 usually a variance, σ a standard deviation, Σ is the summation

operator, also sometimes used for a variance covariance matrix.T τ tau.Υ υ upsilon.Φ φ ϕ phi; Φ(y) sometimes normal distribution function; φ(y) normal density

function.X χ chi; χ2 distribution.Ψ ψ psi.Ω ω omega; Ω often a variance covariance matrix.

6. Descriptive statistics

Data tend to come in three main forms:-time-series, such as. observations on annual inflation in the UK over a number

of years denoted yt, t = 1, 2, ..., T. This indicates we have a sequence of observa-tions on inflation running from t = 1 (say 1961) to t = T (say 1997) so the numberof observations T = 37.-cross-section, e.g. observations on annual inflation in different countries in

a particular year, denoted by yi, i = 1, 2, ..., N . Where, if they were arrangedalphabetically, i = 1 might correspond to Albania and i = N to Zambia. Whereastime-series data has a natural ordering, 1996 comes after 1995, cross-section doesnot.-panels (also called longitudinal data) e.g. observations on inflation in a num-

ber of countries, i = 1, 2, ..., N, in a number of years, t = 1, 2, ..., T, denotedyit,We will use both T for time-series and N for cross-sections to denote the

number of observations in a sample.Graphs are generally the best way to describe data. There are three types of

graph economists commonly use, each are illustrated in section 7.5.

44

Page 45: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(1) For time-series data, we use a line graph, plotting the series against time.We can then look for trends (general tendency to go up or down); regular seasonalor cyclical patterns; outliers (unusual events like wars or crises).(2) We can plot a histogram, which gives the number (or proportion) of ob-

servations which fall in a particular range.(3) We can plot one variable against another to see if they are associated, this

is a scatter diagram or X-Y Plot. Barrow Chapter 1 has lots of examples.

6.1. Summary Statistics

We will use algebra, particularly the summation operator, to describe operationson data. The formulae may look complicated, but they are just a set of instruc-tions. Suppose we have a series of numbers: 2,4,6,8, which we denote, x1, x2, x3, x4;xi, i = 1, 2, ..., N , where N = 4. The sum of these is 20, which we denote

N∑i=1

xi = 2 + 4 + 6 + 8 = 20

this simply says add together the N elements of x. If we multiply each numberby a constant and add a constant to each number to create yi = a+ bxi, then

N∑i=1

yi =N∑i=1

(a+ bxi) = Na+ bN∑i=1

xi. (6.1)

In the example above for a = 1, b = 2, then yi = 5, 9, 13, 17, with sum 44, whichis the same as 4× 1 + 2× 20.

6.1.1. The mean, a measure of central tendency or typical value

The arithmetic mean (average) of xi, usually denoted by a bar over the variable,said ‘x bar’, is defined as

x =N∑i=1

xi/N.

In this example, it is 20/4=5. The formula just says add up all the values anddivide by the number of observations. There are other sorts of mean. For instance,the geometric mean is the Nth root of the product of the numbers

GM(x) = N√x1 × x2 × ...× xN

45

Page 46: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

and can be calculated as the exponential (anti-log) of the arithmetic mean of thelogarithms of the numbers, see Barrow P54.

6.1.2. The variance and standard deviation, measures of dispersion

The variance, often denoted σ2, and standard deviation (square root of the vari-ance, σ) measure how dispersed or spread out the observations are. The varianceis more convenient for mathematical derivations, but in applied work we use thestandard deviation because it is in the same units as the original variable. Oneestimator of the variance of xi, (sometimes called the population variance) is

σ2 =N∑i=1

(xi − x)2/N.

Notice here we distinguish between the true value σ2 and our estimate of it σ2,said sigma hat squared. This formula gives a set of instructions. It says takeeach of the observations and subtract the mean, (xi − x); square the deviationsfrom the mean (xi − x)2; add the squared deviations together

∑Ni=1(xi − x)2 and

divide the sum of squared deviations by the number of observations, 4 in this case:∑Ni=1(xi − x)2/N = 20/4 = 5.

i xi xi − x (xi − x)2

1 2 -3 92 4 -1 13 6 1 14 8 3 9Sum 20 0 20

In this case both the Mean and the Variance happen to be 5. The standarddeviation, SD(x) = σ is the square root of the variance: 2.24 in this case. Noticethat the sum of the deviations from the mean is always zero:

N∑i=1

(xi − x) =

N∑i=1

xi −Nx =

N∑i=1

xi −N∑N

i=1 xiN

= 0. (6.2)

Notice we use (6.1) in deriving the second term.Another estimator of the variance of xi, (sometimes called the sample variance)

is

s2 =

N∑i=1

(xi − x)2/(N − 1).

46

Page 47: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

We discuss the difference between σ2 and s2 below.If x has standard deviation σ, then for constants a and b, the new variable

y = a+ bx has standard deviation bσ and thus variance b2σ2.

6.1.3. Covariance and correlation, measures of association

The covariance, which is used to measure association between variables is

Cov(x, y) =N∑i=1

(xi − x) (yi − y) /N.

The Covariance will be positive if high values of x are associated with high valuesof y, negative if high values of x are associated with low values of y. It will be zerois there is no linear relationship between the variables. The covariance can bediffi cult to interpret, so it is often standardised to give the correlation coeffi cient,by dividing the covariance by the product of the standard deviations of the twovariables.

r =Cov(x, y)

SD(x)SD(y)

The correlation coeffi cient lies between plus and minus one, −1 ≤ r ≤ 1. A cor-relation coeffi cient of −1 means that there is an exact negative linear relation be-tween the variables, +1 an exact positive linear relation, and 0 no linear relation.To prove this note that if y = x, Cov(x, y) = V ar(x), and SD(x)SD(y) = V ar(x),so r = 1.Correlation measures linear relationships, there may be a strong non-linear

relationship and zero correlation. Correlation does not imply causation. Twovariables may be correlated because they are both caused by a third variable.

6.1.4. Standardised data

Data are often standardised by subtracting the mean and dividing by the standarddeviation (the square root of the sample variance),

zi =xi − xσ

.

This new variable, zi, has mean zero (from 6.2) and variance (and standard devi-ation) of one, since if x has standard deviation σ, z = σ−1x− σ−1x has standarddeviation σ−1σ = 1. This follows from our rule above that if x has standard

47

Page 48: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

deviation σ, then y = a + bx has standard deviation bσ. In this case a = −σ−1xand b = σ−1.Notice the correlation coeffi cient is the covariance between the standardised

measures of x and y.

6.1.5. Moments

A distribution is often described by:

• its moments, m, which are∑N

i=1 xik/N, k = 1, 2, .... The mean x =

∑xi/N

is the first moment, k = 1.

• its centred moments, cm,∑N

i=1(xi − x)k/N. The variance, σ2 =∑N

i=1(xi −x)2/N is the second centred moment, k = 2. The first centred moment,∑N

i=1(xi − x)/N = 0.

• its standardised moments, sm,∑zki /N, where zi = (xi − x)/σ. The third

standardised moment, k = 3, is a measure of whether the distribution issymmetrical or skewed. The fourth standardised moment, k = 4, is a mea-sure of kurtosis (how fat the tails of the distribution are). For a normaldistribution, the coeffi cient of skewness

∑z3i /N is zero, and the coeffi cient

of kurtosis∑z4i /N is 3.

Momentsk m cm sm1 x 0 0

2∑N

i=1 xi2/N σ2 1

3∑N

i=1 xi3/N

∑Ni=1(xi − x)3/N

∑z3i /N

4∑N

i=1 xi4/N

∑Ni=1(xi − x)4/N

∑z4i /N

Some distributions do not have moments. The average (mean) time to get aPhD is not defined since some students never finish, though the median is defined,the time it takes 50% to finish.

6.2. Example, averages and reversals

Suppose a firm has two factories one in the low wage north, where it employsmainly male staff, and the other in the high wage south, where it employs mainly

48

Page 49: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

females. In both it pays males more than females. The Table below gives thenumber of male staff, NM, the male wage, WM, the number of females, NF andthe female wage WF.

NM WM NF WFN 200 350 50 300S 50 500 200 450The average male wage is (200×350+50×500)/(200+50) = 380. The average

female wage is (50× 300 + 200× 450)/(50 + 200) = 420. Despite paying men morethan women at both factories, the average female wage is higher than the averagemale wage, because it is employing more women in the high wage south. Thisreversal is know as Simpson’s paradox, though he was not the first to note it.

7. Economic and Financial Data I: numbers and graphs

7.1. Tables and Calculations

Data will typically come in a Table, either electronic or hard-copy. When con-structing your own tables, make sure the table has: a title, full def-initions of the variables, the units of measurement and the source ofthe data. Be clear on the units of measurement and get a feel for the ordersof magnitudes: what are typical values, what are the maximum and minimumvalues? When comparing series or graphing them together make sure they are incomparable units.Be careful about units. Variables are all measured in different units and ratios

will depend on the units of the numerator and denominator. Expressing the unitsas powers of 10 is often useful, as in done in scientific notation. 1 = 100; 10 = 101;100 = 102; 1, 000, 000 = 106. The power gives you the number of zeros after theone. Millions are 106, billions 109, trillions 1012. UK GDP in 2009 was about £ 1.4trillion (1012) and its population was about 61 million (106) so its per-capita GDPwas about £ (1.4×1012/61×106) = 0.0229× 106 = £ 22,900Be careful about coverage. In the UK statistics are sometimes for England

and Wales, sometimes Great Britain (England, Wales & Scotland) and sometimesUK (England, Wales, Scotland and Northern Ireland). Team GB at the Olympicsshould have been Team UK.

49

Page 50: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

7.2. Transformations

In many cases, we remove the effects of trends, changes in price levels etc. byworking with either growth rates, or with ratios. In economics and finance certainratios tend to be reasonably stable (not trended). An example is the AveragePropensity to Consume (the ratio of Consumption to Income) or the SavingsRatio. In finance, ratios like the Price-Earnings Ratio or the Dividend Yield areused. Notice these ratios can be compared across countries, because the units ofcurrency in the numerator and denominator cancel. These can be expressed eitheras proportions or multiplied by 100 to give percent.Theory will often tell you what variables to construct, e.g.-Real Interest Rates equal to the nominal ordinary interest rate minus the

(expected) rate of inflation;-real exchange rate, the nominal exchange rate times the ratio of foreign to

domestic price indexes;-the velocity of circulation, the ratio of nominal GDP to the money supply.Very often we work with the logarithms of the variables, for reasons discussed

elsewhere in these notes.

7.3. National Accounts

The output of an economy is usually measured by Gross Domestic Product, GDP.This measure is part of a system of National Accounts, which start from the iden-tity that output equals expenditure equals income. Anything produced, output,is sold to somebody, and is then their expenditure, and the money generated bythe sale is paid to somebody, it is their income. Expenditure is made up of con-sumption Ct, plus investment It, plus government spending on goods and services(i.e. excluding government expenditure on transfer payments) Gt, plus exportsXt, minus imports Mt. Income is made up of wages Wt plus profits Pt. In anyperiod:

Yt = Ct + It +Gt +Xt −Mt = Wt + Pt.

Output produced but not sold, left in the warehouse is treated as a change in in-ventories part of investment by firms. Investment includes Gross Domestic FixedCapital Formation and acquisition of inventories. Although, in principle, output,expenditure and income are identically equal, in practice because of measurementerrors, they are not and ‘balancing items’are added to make them match. Cov-erage can be national (by the citizens of the country) or domestic (within theboundaries of the country). The difference between Gross National Product and

50

Page 51: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Gross Domestic Product is net income (recipts less payments) from the rest of theworld. The measures can be gross of the depreciation of capital stock or net ofit. Be careful about Gross and Net measures, since it is often not clear from thename what is being netted out. The measure can be at the prices actually paid(market prices) or what the producers actually receive (factor cost) the differenceis indirect taxes less subsidies. The UK national accounts conventions are basedon the European System of Accounts 1995 and are published in the Blue Book.GDP or other national income aggregates measure marketed outputs and there

are lots of activities that are left out. These include domestic activities (householdproduction), environmental impacts, illegal activities, etc. If there is an increase incrime which leads to more security guards being hired and more locks fitted, thisincreases GDP. There are various attempts to adjust the totals for these effectsalthough, so far, they have not been widely adopted. You should be aware of thelimitations of GDP etc as measures. There is a good discussion of the issues andalternatives in the Report by the Commission on the Measurement of Economicand Social Progress (2009) available on www.stiglitz-sen-fitoussi.fr.The accounts are divided by sector. The private sector covers firms (the cor-

porate sector usually divided into financial and non-financial) and households;the public sector covers general government (which may be national or local)and sometimes state owned enterprises, though they may be included with thecorporate sector; the overseas sector covers trade. Corresponding to the output,expenditure and income flows, there are financial flows between sectors. DefineTt as taxes less transfer payments. The total Yt − Tt factor income minus taxesplus transfer payments (e.g. state pensions or unemployment benefit) is known asdisposable income. Subtract Tt from both sides of the income-expenditure identity

Yt − Tt = Ct + It +Gt − Tt +Xt −Mt

note that savings St = Yt − Tt − Ct. Move Ct and It to the left hand side to give:

(St − It) = (Gt − Tt) + (Xt −Mt)

(St − It) + (Tt −Gt) + (Mt −Xt) = 0

the three terms in brackets represent what each sector - private, public, overseas -needs to borrow or lend and total borrowing must equal total lending. If savingsis greater than investment, the private sector has a surplus of cash which it canlend to the public or overseas sector. They sum to zero because for every borrowerthere must be a lender.

51

Page 52: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

7.4. Unemployment

We often have a theoretical concept and need to provide an ‘operational’defini-tion, a precise set of procedures which can be used by statistical offi ces, to obtainmeasures. This raises questions like what is the best operational measure andhow well does it correspond to the particular theoretical concept. Unemploy-ment is a case in point. There are a number of different theoretical concepts ofunemployment and a number of different ways of measuring it.One method is the ‘Claimant count’, i.e. the number who are registered unem-

ployed and recieving benefit. But this is obviously very sensitive to exact politicaland administrative decisions as to who is entitled to receive benefit.An alternative is a survey, which asks people of working age such questions as(i) are you currently employed; if not(ii) are you waiting to start a job; if not(iii) have you looked for work in the last four weeks.Those in category (iii) will be counted as unemployed. The unemployment

rate is the number unemployed divided by the number in the labour force (em-ployed plus unemployed). Youth unemployment rates are high partly because thedenominator, the labour force, is reduced by the number of people studying.

7.5. Example: Were the nineties and noughties NICE?

7.5.1. Introduction

Mervyn King, the previous Governor of the Bank of England, described the UKeconomic environment at the end of the 20th century and the beginning of the21st century as NICE: non-inflationary, consistently expanding. Subsequently itbecame VILE: volatile inflation, less expansion. This example uses descriptivestatistics and graphs to compare UK growth and inflation over the period 1992-2007, with their earlier and later behaviour to see how nice this period was.

7.5.2. Data

The original series, from the Offi ce of National Statistics, are for 1955Q1-2014Q1,Qt = Gross Domestic Product, chained volume measure, constant 2010 prices,seasonally adjusted (ABMI) Et = Gross Domestic Product at current price: Sea-sonally adjusted (YBHA). Ut is the ILO unemployment rate: UK: All: Aged16-64: %, seasonally adjusted: (LF2Q), which is only available from 1971. Theprice index, the GDP deflator, is Pt = Et/Qt. Growth (the percentage change in

52

Page 53: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

output), gt, and inflation (the percentage change in prices), inft, are measuredover the same quarter in the previous year as gt = 100 ∗ (ln(Qt) − ln(Qt−4)) andinft = 100 ∗ (ln(Pt) − ln(Pt−4)), which are very close to the percentage changes.Such annual differences smooth the series and would remove seasonality if theywere not already seasonally adjusted. Whereas the data for ouput and pricesstarts in 1955Q1, the data for growth and inflation only starts in 1956Q1, becauseof taking the 4 quarter change. The most recent figures are very likely to getrevised and many have questioned the accuracy of the figures.

7.5.3. Line graphs

The broad pattern of UK economic events can be seen from the line graphs forgrowth, inflation, and unemployment. UK economic policy since World War IIcan be described as a search for targets. Initially the target was the balance ofpayments to support a fixed exchange rate. Then with the end of the BrettonWoods system of fixed exchange rates in the early 1970s, there was a shift tomonetary targets to control inflation, a shift which became more pronounced withthe election of Mrs Thatcher in 1979. However, the monetary aggregates provedvery unstable and there was a switch to exchange rate targets in the middle1980s, culminating in joining the European Exchange Rate Mechanism, ERM.With the ejection of sterling from the ERM in September 1992, inflation targetswere adopted. In 1997, the newly elected Labour government gave the Bank ofEngland independent responsibility for monetary policy, setting interest rates totarget inflation. This history is reflected in the graphs for growth and inflationbelow. The "stop-go" pattern of growth in the 1950s and 1960s is obvious, thenthere is a peak when growth reached almost 10% during the ‘Barber Boom’of theearly 1970s following the collapse of the Bretton Woods system of fixed exchangerates. Anthony Barber was the conservative Chancellor at the time. The firstoil price shock of 1973 following the Arab-Israeli war sent the economy into deeprecession, with growth negative in most quarters between 1974Q1 and 1975Q4.During 1976 the UK had to borrow from the IMF. Growth recovered in the later1970s, before a further recession in the early 1980s following the second oil priceshock after the Iranian revolution and Mrs Thatcher’s monetarist policies. Growthrecovered in the later 1980s with the boom under Nigel Lawson, the ConservativeChancellor, then sank into recession again in the early 1990s, possibly worsenedby the fixed exchange rate required by membership of the ERM. The UK left theERM in September 1992 and there followed a long period of relative stability,

53

Page 54: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

before the effects of the 2007 Credit Crunch began to impact on the economy.Output fell by about 7% in the year up to 2009Q1, the lowest observed in thissample and while the economy bounced back, subsequent growth was slow.Inflation was below 10%, though volatile during the 1960s and 1970s. In the

mid 1970s it shot up to almost 25%, before falling back to almost 10%, then risingagain over 20% following the election of Mrs Thatcher in 1979. It then came downbelow 5%, with a burst in the late 1980s and early 1990s, before stabilising at alow level subsequently. There are a number of different measures of inflation, CPI,RPI etc., and they show slightly different patterns from the GDP deflator usedhere. The unemployment rate, which is only available on a consistent basis since1971, shows the effects of the recessions of the mid 70s, early 80s, early 90s andlate 00s. The combination of high unemployment and high inflation is sometimescalled stagflation.

­8

­4

0

4

8

12

55 60 65 70 75 80 85 90 95 00 05 10

GROWTH

Figure 7.1: UK growth 1956-2014

54

Page 55: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

­5

0

5

10

15

20

25

30

55 60 65 70 75 80 85 90 95 00 05 10

INF

Figure 7.2: UK inflation 1956-2014

2

4

6

8

10

12

14

1975 1980 1985 1990 1995 2000 2005 2010

U

Figure 7.3: UK Unemployment 1971-2014

7.5.4. Frequency distributions.

The histograms and summary statistics which describe the distribution of growthare shown below. An average growth rate of just over 2%, seems to be a long-term feature of the British economy, the mode is between 2 and 2.5%. The

55

Page 56: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

distribution is not normal, with a slight negative skew and fatter tails than anormal distribution. If a random variable is normally distributed, the coeffi cientof skewness should be zero and the coeffi cient of kurtosis 3. The Jarque-Berastatistic, which is distributed as χ2(2), tests whether the skewness and kurtosisare significantly different from these values. The p value, probability of the nullhypothesis of normality being true, indicate that the hypothesis that growth isnormal is rejected. If growth was normally distributed one would expect roughlytwo thirds (68%) of the observations to lie in the range mean plus or minus onestandard deviation, roughly: 0.17 − 4.87. In fact around 73% lie in this range.This is because of the fat tails, if the data were normal one would be very unlikelyto observe a value of -7.03%, which is 4.06 standard deviations below the mean(-7.03-2.52)/2.35. Although the histogram is not shown, inflation is even lessnormal than growth, with a J-B statistic of 334 and is strongly positively skewedas both the coeffi cient of skewness of 1.99 and the difference between the mean at5.1 and the median at 3.3 show. The mean inflation of around 5% is not reallytypical, the mode is between 2% and 3%.

Figure 7.4 Histogram of UK growth

56

Page 57: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

7.5.5. The relationship between growth and the change in unemploy-ment

Growth is a major factor determining the change in unemployment: if growth ishigh unemployment falls, if growth is low, unemployment rises. The relationshipbetween the change in unemployment and growth is known as Okun’s Law, afterthe US economist Arthur Okun. The scatter diagram plots the quarterly changein the unemployment rate, DU against the growth rate over the year for the period1971-2013. The relationship is clearly negative, the correlation is -0.71. One can fita line to the data which takes the form: DUt = 0.2−0.078Gt+εt, with a standarderror of regression of 0.19. Okun estimated the relationship the other way roundwith growth a function of the change in unemployment. Unemployment increasedless in the recent recession than one would expect from the historical pattern,though there is dispute about the reason for this.

­0.6

­0.4

­0.2

0.0

0.2

0.4

0.6

0.8

1.0

­8 ­4 0 4 8 12

GROWTH

DU

Figure 7.5: Scatter diagram, change in unemployment on growth.

7.5.6. Conclusion

Compared to previous (and subsequent) history, the period 1993-2007 was nice.Inflation became low and stable rather than high and volatile, as it was before.Growth was slightly higher and much less volatile. Although there was an eco-nomic cycle 1997-2007, it was less pronounced than in earlier years, hence Gordon

57

Page 58: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Brown’s claim to have abolished boom and bust. Whether this was good luck orgood policy remains a matter of debate, as does whether the easy-money policyof these years contributed to the subsequent financial crisis. This ‘Great Moder-ation’was not confined to the UK, but seems to have been a general feature ofmany advanced economies. There were global economics shocks over this period:the Asian crisis of 1997-8; the Russian default and the LTCM crisis of 1998; thedot.com boom and bust of 2001; the gyrations in the oil price, which went fromaround $10 in 1998 to $147 in 2008; 9/11 and the wars in Iraq and Afghanistan.But despite these shocks, there was smooth non-inflationary growth in the UK,as in many economies. Whether the Great Moderation was merely a transitoryinterlude of stability in a crisis-prone system remains to be seen. As the warningon financial products says: past performance is not necessarily a guide to thefuture.

8. Applied exercise I: Ratios and descriptive statistics

This exercise is to be done on your own. We will assume that you have done thisexercise and are familiar with this material, so we can set exam questions using it.It will be started in the spreadsheet class. The data are available on the ASE webpage. They were constructed by Robert J. Shiller, updating the series in Robert JShiller,Market Volatility, MIT Press 1989. His later book, Irrational Exhuberance,Princeton University Press 2000 is a good description of the stock market bubbleat the end of the twentieth century. At the end of this data, January 2000, theS&P was 1425, it peaked in March 2000 at 1527 He got timing of his publicationright, since it then fell sharply and was around 1000 in late August 2003, witha 2003 low as of then of 800 (taken from FT World Markets at a glance table).Check what it has done since; bubbles (when prices shoot up to unsustainablelevels) and financial crises (when they return to normal) are recurrent features ofthe system.

8.1. Data

The data is in an Excel file called Shiller.xls, on Moodle. Copy this file to yourown directory and when you have finished the exercise copy the new file to yourown space using a new name.In the file, rows 2 to 131, contain US data from 1871 to 2000; row 1 contains

headings:

58

Page 59: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

A-YEARB-NSP: Stock Price Index (Standard & Poors composite S&P) in current prices

(Nominal terms), January figure.C-ND: Dividends per share in current prices, average for the year.D-NE: Earnings per share in current prices, average for the year.E-R: Interest rate, average for the year.F-PPI: The producer price index, January figure, 1967 average=100.The letters A to F indicate the columns in the spreadsheet. Note 2000 data

are missing for ND, NE, R.

8.2. Transformations: ratios, growth rates, correcting for inflation, etc

In column G construct the (backwards) Price Earnings Ratio. Type PER in cellG1 as a heading. Put =B3/D2 in cell G3. Copy this down for the rest of theyears ending in G131. The price earnings ratio is a measure of the underlyingvalue of a share, how much you have to pay to buy a stream of earnings.Highlight the data for PER over G3:G131. Use Chart wizard (on toolbar) and

choose a line graph, top left sub-type, choose next and go to next. Commenton the picture. Can you identify the Stock Market Booms of the late 1920s and1990s. What happened next in both cases? What other features of the data canyou see?In the same way, create the following new variables for the period 1872 to

1999, i.e. rows 3 to 130, with the variable name at the top:H- Dividend Yield: DY (Type DY in cell H1 and type the formula =C3/B3

in cell H3 and copy down)I-Capital Gain: CG (type CG in I1) type the formula =(B4-B3)/B3 in

I3 and copy downJ-Inflation:INF=(F4-F3)/F3 type formula in J3K-Real Return on equities: RRE=H3+I3-J3L- Real Interest Rate RIR=(E3/100)-J3.Notice that these are proportions, e.g. numbers like 0.05. This corresponds to

5%. Why do we subtract the rate of inflation to get the real return, whereas wewould divide the stock price index by the price level to get the real stock priceindex? Why do we divide the interest rate by 100? Plot inflation. Notice how wehave matched the dates, e.g. in defining inflation, we have had to take account ofthe fact that the price index is a January figure.

59

Page 60: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

8.3. Graphing Real Returns on Equities

We now want to analyse real return on equities (i.e. dividends plus capital gainsless inflation) over the period 1872-1999 and compare it with the real interest rate.First we will do it graphically. Highlight K3:L130, click chart wizard, choose linegraph and view real returns and the real interest rate. There is a lot of volatilityfrom year to year in the RRE and the range is large: from about -0.5 to +0.5.Over this period, you could have made or lost almost 50% of your investment inexceptional years. The real interest rate is much less volatile. Go to cell J132type in mean, then K132 and type in =AVERAGE(K3:K130). Go to J 133type in SD, then K133 and type in =STDEV(K3:K130). Copy these two cellsto L132 and 133. You now have the mean (average) and the (unbiased estimateof) the standard deviation for the real return on equities and the real interestrate. The mean real return on equities is much higher than that from the interestrate. This is known as the equity premium. But the risk on equities, measuredby the standard deviation, is also much higher. Calculate the average real returnon equities, 1872-1990 and 1991-1999.

8.4. More Descriptive Statistics

We want to calculate descriptive statistics for the array K3:J130. Following theprinciples above construct the following table for RRE and RIR. Only the figuresfor RRE are given below.

line J K RRE132 Mean =AVERAGE(RR) 0.088133 SD =STDEV(RR) 0.184134 MIN =QUARTILE(k3:k130,0) -0.448135 25% =QUARTILE(k3:k130,1) -0.021136 MEDIAN =QUARTILE(k3:k130,2) 0.091137 75% =QUARTILE(k3:k130,3) 0.219138 MAX =QUARTILE(k3:k130,4) 0.514139 SKEW =SKEW(k3:k130) -0.389140 KURTOSIS =KURT(k3:k130) 0.296You can also use the descriptive statistics command in data analysis to get

most of these.

60

Page 61: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

8.5. Interpretation of spreadsheet summary statistics

Call the series yt, t = 1, 2, ..., T. Above we calculate the mean: y =∑T

t=1 yt/T,

the unbiased ‘sample’standard deviation, SD: s =√∑T

t=1(yt − y)2/(T − 1). Theamount T − 1 is known as the degrees of freedom.Then we calculate the minimum value; the value which 25% of the observations

lie below; the median, the value which 50% of the observations lie below; the valuewhich 75% of the observations lie below; and the maximum. These are known asQuartiles. Excel also allows you to calculate percentiles: the value for which x%lie below, for any x. Returns were negative in over 25% of the years. The medianis very similar to the mean which suggests that the distribution is symmetric. Therange of real returns is very large, between minus 50% and plus 50%.Standardise the data so that zt = (yt−y)/s. The measure of skewness is roughly∑Tt=1 z

3t /T the third standarised moment. In fact Excel makes degrees of freedom

adjustments, similar to the sample standard deviation above. If the distribution issymmetrical, the measure of skewness should be zero. In this case, it is pretty nearzero. The measure of (excess) kurtosis is roughly

∑Tt=1 z

4t /T−3. If the distribution

is normal the expected value of the first term (the fourth standardised moment) isthree, so values around zero indicate a roughly normal distribution. You can getexact definitions from HELP, STATISTICAL FUNCTIONS, SKEW & KURT.

9. Index Numbers

9.1. Introduction

Inflation, the growth rate of the price level, is measured by percentage change ina price index:

πt = 100 ∗ (Pt − Pt−1)/Pt−1 = 100 ∗ (Pt/Pt−1)− 1 ≈ 100 ∗ (lnPt − lnPt−1)

where Pt is a price index. Distinguish between the price level, Pt and the rateof inflation, πt. When the inflation rate falls, but is still positive, prices are stillgoing up, just at a slower rate. If inflation is negative, prices are falling. Supposethe Price Index was 157 in 1995 and 163 in 1996, then the rate of inflation is3.82%. Notice that this can also be expressed as a proportion, 0.0382. In manycases, we will calculate the growth rate by the change in the logarithm, which isvery close to the proportionate change for small changes, e.g. < 0.1, i.e.10%. Theapproximation works because ln(1 + πt) = πt if πt is small.

61

Page 62: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Be careful with percentages. If the inflation rate rises from 3% to 6% ithas risen by three percentage points. It has not risen by three percent, in fact ithas risen by 100%. If something falls by 50% and then rises by 50%, it does notget back to where it started. If you started at 100, it would fall to 50, then riseby 50% of 50, 25, to get to 75.We usually work with natural logs to the base e, often denoted by LN rather

than LOG, sometimes used just for base 10. Price indexes are arbitrarily set at100, or 1, in some base year, so the indexes themselves cannot be compared acrosscountries. The index can be used to compare growth relative to the base year ifthey all have the same base year, e.g. 1990=100 for all countries.

9.2. Example: deflation after WWI

World War I ended in November 1918. The table below gives data on the per-centage unemployment rate U ; the Retail Price Index 1963=100, RPI; the yieldon 3 month Treasury Bills, R, (these were not issued during the war); GDP percapita in 1913 prices Y ; and the dollar-sterling exchange rate, $/$ (sterling wasnot convertible during the war); for 1918-1922. From these we can calculate: thegrowth rate of per capita income, Growtht = 100(Yt/Yt−1− 1); the inflation rate,INFt = 100(RPIt/RPIt−1−1); and and the real interest rate, RIRt = Rt−INFt,for 1919-1922.

Y ear U RPI R Y $/$ Growth INF RIR1918 0.8 42 541919 2.1 56 3.5 48 4.42 −11 33.3 −29.81920 2.0 52 6.2 47 3.66 −2 −7.1 13.31921 12.9 47 4.6 42 3.85 −11 −9.6 14.21922 14.3 38 2.6 44 4.43 5 −19.1 21.7In 1919 per capita income fell by over 10%, and continued falling till 1921.

This fall produced rising unemployment. Inflation was very high during 1919, butthen fell giving negative inflation, deflation. Between 1919 and 1922 prices fell byalmost a third.If you lend £ 1 at 15% for a year you get £ 1.15 at the end of theyear, but if prices rise at 10%, over the year what you can buy with your £ 1.15has fallen, the real rate of return is only 5%=15%-10%. When you lend the moneyyou do not know what the rate of inflation will be. In many cases the expectedrate of inflation can be approximated by the current rate of inflation, which youknow, so the real interest rate is often measured as the nominal interest rate minusthe current rate of inflation. In 1919 the real interest rate was negative becauseinflation was higher than the nominal interest rate, subsequently with quite high

62

Page 63: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

nominal rates and deflation (negative inflation) real interest rates became veryhigh.The combination of sharply reduced military spending and high real interests

rate caused deflation (falling prices), falling output, rising unemployment andafter 1920 a strengthening of the exchange rate. The Chancellor of the Exchequer,Winston Churchill returned sterling to the gold standard at its pre-war parity in1925. Keynes blamed this policy for the depression of the early 1920s.

9.3. Prices and Quantities, Real and nominal variables

Suppose that a firm buys 2 million barrels of oil in 2003 at $35 a barrel and onemillion in 2004 at $40 a barrel, we can denote the price in 2003 as Pt, and theprice in 2004 as Pt+1 both measured in dollars. Similarly the quantities are Qt andQt+1, both measured in million barrels. Total expenditure on oil in each year isEt = PtQt and Et+1 = Pt+1Qt+1, both measured in million dollars.

P Q Et 35 2 70

t+ 1 40 1 40

The change in expenditure from $70m to $40m, reflects both a 14.3% increasein price and a 50% fall in quantity. Notice that we need only two of the threepieces of information in the table for each year. Above knowing price and quantitywe could work out expenditure. If we knew expenditure and quantity we couldalways work out price as Pt = Et/Qt. If we knew expenditure and price, we couldwork out quantity as Qt = Et/Pt.Often we work with logarithms, where the proportionate change in expenditure

can be decomposed into the sum of proportionate changes in price and quantity

∆ lnEt = lnEt − lnEt−1

= (lnPt + lnQt)− (lnPt−1 + lnQt−1)

= (lnPt − lnPt−1) + (lnQt − lnQt−1)

= ∆ lnPt + ∆ lnQt

Notice that the formulae would be more complicated if we worked with the original

63

Page 64: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

values

∆Et = PtQt − Pt−1Qt−1

= (Pt−1 + ∆Pt)(Qt−1 + ∆Qt)− Pt−1Qt−1

= Pt−1Qt−1 + Pt−1∆Qt +Qt−1∆Pt + ∆Pt∆Qt − Pt−1Qt−1

= Pt−1∆Qt +Qt−1∆Pt + ∆Pt∆Qt.

The change in quantity measured at last years prices, plus the change in pricesmeasured at last years quantities plus an interaction term. The easiest way topresent this is on a graph with price and quantity on the two axes. Revenue isthen the area of the rectangle, price times quantity. Draw the two rectangles foryears t and t− 1. The difference between their areas will be made up of the threecomponents of the final equation.Most of the time, we are not dealing with a single good, but with aggregates

of goods, so that total expenditure is the sum of the prices times the quantities ofthe different goods, i = 1, 2, ..., N whose prices and quantities change over time.

Et =n∑i=1

pitqit.

This is like your supermarket receipt for one week, it lists how much of each itembought at each price and the total spent. To provide a measure of quantity, wehold prices constant at some base year, 0, say 2000 and then our quantity orconstant price measure is

Qt =n∑i=1

pi0qit.

Monetary series can be either in nominal terms (in the current prices of thetime, like expenditures) or in real terms (in the constant prices of some base yearto correct for inflation, to measure quantities). To convert a nominal series intoa real series it is divided by a price index. So if we call nominal GDP Et andreal GDP Qt, and the price index Pt then Et = PtQt. So given data on nominal(current price) GDP and a price index we can calculate real (constant price) GDPas Qt = Et/Pt, where Pt is the value of a price index. Alternatively if we havedata on current price (nominal) and constant price (real) GDP, we can calculatethe price index (usually called the implicit deflator) as the ratio of the current toconstant price series: Pt = Et/Qt.

64

Page 65: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Most statistical sources only give two of the three of the possible series, nom-inal, real, price, assuming (somewhat implausibly) that users will know how tocalculate the third from the other two.

9.4. Price Indexes

Suppose we wish to measure how the prices of a set of goods, i = 1, 2, ..., N havemoved over time, t = 0, 1, 2, ..., T, (e.g. 1990,1991,1992). We observe the prices,pit and quantities, qit of each good in each year. Total expenditure on all goodsin year t, e.g. current price GDP, is Et =

∑Ni=1 pitqit. We could also express this

as an index, relative to its value in some base year:

EIt =

∑Ni=1 pitqit∑Ni=1 pi0qi0

here the index would be 1 in the base year, usually they are all multiplied by 100to make them 100 in the base year. If the base is 100, then EI

t − 100 gives thepercentage change between the base year and year t. Index numbers are ‘unitfree’. This is an expenditure index.A constant price series would measure quantities all evaluated in the same

base year prices. Suppose we used year zero, then the constant price measure ofquantity would be

Qt =N∑i=1

pi0qit.

Constant price GDP was a measure of this form, where the base year was changedevery five years or so. Recently this fixed base approach has been relaced by amoving base called a chain-weighted measure.We can construct a price index as the ratio of the expenditure series to the

constant price series (in the case of GDP, this would be called the GDP deflator)

P 1t =EtQt

=

∑Ni=1 pitqit∑Ni=1 pi0qit

.

It measures prices in year t relative to prices in year zero, using quantities in yeart as weights. Where t = 0, P 1t = 1. The index always equals 1 (or 100) in its baseyear. This is a price index.

65

Page 66: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

We could also use quantities in year zero as weights, and this would give adifferent price index.

P 2t =

∑Ni=1 pitqi0∑Ni=1 pi0qi0

. (9.1)

Notice that these will give different measures of the price change over the period0 to t. In particular, for goods that go up (down) in price, quantities in yeart are likely to be lower (higher) than in year 0. Indexes that use beginning ofthe period values as weights are called Laspeyres indexes, those that use end ofperiod values are called Paasche indexes. There are a range of other ways we couldcalculate price indexes; chain indexes use moving weights. Apart from the problemof choosing an appropriate formula, there are also problems of measurement; inparticular, measuring the quantities of services supplied, accounting for qualitychange and the introduction of new goods. Barrow Chapter 10 discusses indexnumbers.You will often find that you have overlapping data. For instance, one edition

of your source gives a current price series and a constant price series in 1980prices for 1980 to 1990; the second gives you a current price series and a constantprice series in 1985 prices for 1985 to 1995. This raises two problems. Firstly thecurrent price series may have been revised. Use the later data where it is availableand the earlier data where it is not. Secondly, you have to convert the data toa common price basis. To convert them, calculate the ratio in 1985 (the earliestyear of the later source) of the 1985 constant price series to the 1980 constantprice series; then multiply the earlier 1980 price series by this ratio to convertthe 1980 constant price series to 1985 constant prices. If the two estimates of thecurrent price series for 1985 were very different, you would also have to adjust forthe ratio of the current price series.

9.4.1. Example Substitution

In 2000 a company bought 10 computers at £ 2000 each and 20 software licensesat £ 1000 each. In 2001 it bought 20 computers at £ 1000 each and 10 softwarelicenses at £ 2000 each.(a) What were its total computing costs in each year?(b) What would have been its total computing costs in each year (i) if it had

bought the 2000 quantities (ii) if it had bought the 2001 quantities?(c) Use the estimates in (b) to calculate two measures of inflation (i) using

2000 quantities as weights and (ii) using 2001 quantities as weights.

66

Page 67: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) Comment on your results.AnswerThis example is a little extreme to indicate the effects substitution can have

on the measurement of inflation.(a) Total expenditure was £ 40,000 in both years: 2000=(10×2000+20×1000);

2001=(20× 1000 + 10× 2000).(b) (i) Using 2000 quantitities and 2000 prices, expenditure in 2000 would have

been £ 40,000, which it was. Using 2000 quantities and 2001 prices expendituresin 2001 would have been (10×1000+20×2000)= 50,000 (ii) Similarly using 2001quantities, 2000=50,000 and 2001=40,000(c) Using 2000 quantities as weights inflation is 25%=100(50,000/40,000-1) ,

using 2001 quantities as weights inflation is —20% =100(40,000/50,000-1).(d) Because of demand responses to price (the firm bought more hardware

which had fallen in price and less software which had risen in price), base weightedmeasures tend to overestimate inflation (+25%) and terminal weighted measurestend to underestimate it (-20%). The truth lies somewhere in between.

9.4.2. CPI and RPI

There are a lot of different price indexes. Currently the Bank of England has atarget for annual CPI inflation of 2% and the governor has to write a letter to theChancellor of the Exchequer if inflation falls below 1% or goes above 3%. Before2004 the target was for inflation in the Retail Price Index excluding mortgageinterest payments (which go up when interest rates are raised) RPIX of 2.5%. TheCPI is. the Harmonised Index of Consumer Prices HICP, the standard Europeanindex. In August 2003 RPIX was 2.9%, CPI 1.3%, most of the difference accountedby the high rate of UK housing inflation, while in May 2009 the CPI was at +2.2%and the RPI at -1.1%, deflation, not inflation, because of falling house prices. TheRPI and CPI measure consumer prices, the GDP deflator measures prices in thewhole economy.There are three main differences between the CPI and RPI: CPI excludes

owner occupied housing, RPI includes it; they use different formulae and theirtarget populations are slightly different, the RPI excludes the very rich and verypoor. The formula effect was thought to contribute about 0.5% to the difference(hence the replacement of a 2.5% RPI target with a 2% CPI target) though thishas increased to almost 1% recently. Consider the Laspeyres price index, P 2t above

67

Page 68: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(9.1)

Pt =

∑Ni=1 pitqi0∑Ni=1 pi0qi0

=

N∑i=1

pitqi0∑Ni=1 pi0qi0

=

N∑i=1

(pitpi0

)(pi0qi0∑Ni=1 pi0qi0

)=

N∑i=1

(pitpi0

)wi0.

The term pit/pi0 is called a price relative and wi0 is the share of good i in totalexpenditure in year zero. So the price index is a weighted average of price relatives.Here pit would be a category like clothing, wi0 the share of expenditure on clothing.But one needs to average the prices of all the different items of clothing, (jeans,shirts, ..) where you do not have expenditure weights, to get what are calledelementary aggregates, EAs. There are 3 widely used methods of averaging namedafter the people who developed them:(a) arithmetic mean of price relatives (Carli) used for 35% of EAs in the RPI:

IC =

n∑j=1

(pjtpj0

)/n

(b) ratio of arithmetic means of prices (Dutot) used for 55% of EAs in the RPIand 30% of EAs in the CPI:

ID =

∑pjt/n∑pj0/n

(c) the geometric mean of price relatives (Jevons) which is identical to theratio of geometric mean prices, used for 70% of the EAs in the CPI:

IJ = n

√√√√ n∏i=1

(pjtpj0

)=

n

√∏pjt

n

√∏p0t

Methods (b) and (c) tend to give quite similar results, method (a) can give verydifferent ones particularly when the variance of prices is large. Method (a) isprohibited by the EU regulations that govern construction of the CPI and mostof the formula difference between RPI and CPI comes from the use of method (a)in the RPI. Method (c) tends to deal with substitution, of the sort seen in theprevious example rather better.The government is trying to shift indexing to the CPI from the RPI, though

Index linked government debt is linked to the RPI, which raises contractual issues.The ONS is introducing a variant of the RPI which uses the Jevons rather thanCarli procedure to construct elementary aggregates.

68

Page 69: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

9.4.3. Example House Prices

There are a number of different indexes of house-prices, which can show verydifferent trends. A major difference is the stage of the transaction at which theymeasure the price. Some measure the asking price of houses put on the market.This provides early information, but they may sell for more or less than the initialasking price. The Building Society series are based on mortgage approvals, againthis may not reflect the final price and about 25% of houses are paid for in cash andare not captured in these series since their purchase does not require a mortgage.Other series use surveys of completions (when the sale is completed). The LandRegistry figure is based on when the transaction is registered, and covers the finalprice for all housing transactions. The gap between the house being put on themarket and the transaction registered can be over six months. The indexes alsodiffer in (a) whether they adjust for the mix of transactions; in unadjusted seriesaverage price will jump if there is the sale of a very expensive house (b) how oftenthey are published and how long it takes to publish the data (c) whether theyare seasonally adjusted. House prices are usually compared to average earnings,with a normal 20th century UK ratio about 3.5. In the 21st century the ratio rosehigher than any previous peak, before dropping back.

9.4.4. Example Index linked weapons contracts

Weapons procurement contracts often cover long periods, because it takes manyyears to develop and get them into production. Eurofighter/Typhoon, whichonly entered service in 2004, was started in the early 1980s. In such contractsit is common to link the agreed price to inflation, the issue then becomes whichindex to use. On the EH101 helicopter, the prime contractor, IBM at the timeproposed a simple materials and fuels index (essentially an output price index).The UKMinistry of Defence insisted on the use of a combined costs index reflectinginput costs including labour. Because of productivity growth output price indexesgrow more slowly than input cost indexes. The National Audit Offi ce, in itsreport Accounting for Inflation in Defence Procurement, para 2.25, December1993, calculated that had the MOD used the index suggested by IBM, ratherthan the one it had insisted on, it could have saved itself £ 95 million or about 6%of the contract price over the lifetime of the contract. This particular over-spendgot very little publicity because most journalists and MPs, like ASE students,tend to fall asleep once index numbers are mentioned.To see the relation between prices and wages, write the total value of sales

69

Page 70: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(price times quantity) as a markup on labour costs, wages times number employed

PtQt = (1 + µ)WtEt

prices are then a mark-up on unit labour costs

Pt = (1 + µ)WtEt/Qt

and noting that productivity is output per worker Qt/Et

lnPt = ln(1 + µ) + lnWt − ln(Qt/Et)

so if mark-ups are constant, output price inflation is the rate of growth of wagesminus the rate of growth of productivity:

∆ lnPt = ∆ lnWt −∆ ln(Qt/Et)

10. Probability

10.1. Introduction

We need to analyse cases where we do not know what is going to happen: wherethere are risks, randomness, chances, hazards, gambles, etc. Probabilities providea way of doing this. Some distinguish between (a) risk: the future is unknownbut you can assign probabilities to the set of possible events that may happen;(b) uncertainty: you know the set of possible events but cannot assign proba-bilities to them; and (c) unawareness where you cannot even describe the set ofpossible events, what US Defense Secretary Donald Rumsfeld called the unknownunknowns, the things you do not even know that you do not know about. Peopleseem to have diffi culty with probabilities1 and it is a relatively recent branch ofmathematics, nobody seems to have regarded probabilities as things that couldbe calculated before about 1650 (after calculus) and the axiomatic foundations ofprobability theory were only provided in the 1930s by the Russian mathematicianKolmogorov.Probabilities are numbers between zero and one, which represent the chance

of an event happening. Barrow chapter 2 discusses them. If an event is certain

1D. Kahneman, Thinking, fast & slow, Penguin 2011, discusses these diffi culties.

70

Page 71: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

to happen, it has probability one; if an event is certain not to happen, it hasprobability zero. It is said that only death and taxes are certain, everythingelse is uncertain. Probabilities can either represent degrees of belief, or be basedon relative frequency, the proportion of times an event happens. So if in pasthorse races the favourite (the horse with the highest probability, the shortest oddsoffered by bookmakers) won a quarter of the time, you might say the probability ofthe favourite winning was 0.25; this is a relative frequency estimate. Alternativelyyou could look at a particular future race, study the history (form) of the horsesand guess the probability of the favourite in that race winning, this is a degree ofbelief estimate. You bet on the favourite if your estimate of the probability of thefavourite winning is greater than the bookmakers estimate, expressed in the oddsoffered; the odds are the ratio of the probability to one minus the probability.There is a large literature on the economics and statistics of betting. Notice thatalthough the probabilities of the possible events should add up to one (it is certainthat some horse will win the race), the implied probabilities in the odds offeredby bookmakers do not. That is how they make money on average. There arealso systematic biases. For instance, the probability of the favourite winning isusually slightly better than the bookmaker’s odds suggest and the probability ofan outsider slightly worse. This favourite-longshot bias has been noted for over60 years in a variety of horse-races, but its explanation is still subject to dispute.If you throw a dice (one dice is sometimes known as a die) there are six possible

outcomes, 1 to 6, and if the die is fair each outcome has an equal chance; so theprobability of any particular number is 1/6. On one throw you can only get onenumber, so the probability of getting both a 3 and a 4 on a single throw is zero,it cannot happen. Events which cannot both happen (where the probability ofboth happening is zero) are said to be mutually exclusive. For mutually exclusiveevents, the probability of one or the other happening is just the sum of theirprobabilities, so the probability of getting either a 3 or a 4 on one throw of a diceis 1/6+1/6=2/6=1/3.Suppose two people, say A and B, each throw a dice the number B gets is

independent of the number A gets. The result of A’s throw does not influence B’sthrow. The probability of two independent events happening is the product oftheir probabilities. So the probability of both A and B getting a 3 is 1/6× 1/6 =1/36. There are 36 (62) possible outcomes and each are equally likely. The 36outcomes are shown in the grid below, with the six cases where A and B get anequal score shown in bold. So there is a probability of 6/36 = 1/6 of a draw. Wecan also use the grid to estimate the probability of A getting a higher score than

71

Page 72: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

B. These events correspond to the 15 events above the diagonal, so the probabilityof A winning is 15/36 = 5/12; the probability of B winning is also 5/12 and theprobability of them getting an equal score, is 1/6 = 2/12. Notice the 3 events (Awins, B wins, a draw) are mutually exclusive and their probabilities sum to one,12/12.

A1 2 3 4 5 6

1 x x x x x x2 x x x x x x

B 3 x x x x x x4 x x x x x x5 x x x x x x6 x x x x x x

When events are not mutually exclusive, one has to allow for the probability ofboth events happening. This seems to have been first pointed out by Bernouilli inhis Ars cojectandi in 1713, with a gruesome example. "If two persons sentencedto death are ordered to throw dice under the condition that the one who gets thesmaller number of points will be executed, while he who gets the larger numberwill be spared, and both will be spared if the number of points are the same,we find that the expectation of one of them is 7/12. It does not follow that theother has an expectation of 5/12, for clearly each of them has the same chance,so the second man has an expectation of 7/12, which would give the two of theman expectation of 7/6 of life, i.e. more than the whole life. The reason is thatthere is no outcome such that at least one of them is not spared, while thereare several in which both are spared."2 A will win 5/12 times, draw 2/12 times,so survives 7/12 times. Similarly for B. The probability of A or B survivingis the sum of the probability of each surviving minus the probability of bothsurviving: 7/12 + 7/12− 2/12 = 1. The probability of both has to be substractedto stop double counting. Check that the probability of getting either a 3 or a4 on two throws of a dice is 1/3 + 1/3 − 1/9 = 20/36. Notice this is differentfrom the probability of getting either a 3 or a 4 on both throws of the dice, whichis (1/3)2 = 1/9. You must be careful about exactly how probability events aredescribed.

2Quoted by Ian Hacking, The emergence of probability, p144, CUP 1975.

72

Page 73: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

10.2. Some rules

Denote the probability of event A happening as P (A). Then the probability ofevent A not happening is 1−P (A). This is called the complement of A, sometimewritten A. If event A is certain to happen P (A) = 1. If event A cannot happenP (A) = 0. Denote both events A AND B happening as P (A ∩ B) (the inter-section); this is often known as the joint probability. If the events are mutuallyexclusive they cannot happen together so P (A∩B) = 0. Denote the probability ofevent A OR event B happening as P (A ∪B) (the union). Then as we saw abovewith the dice:

P (A ∪B) = P (A) + P (B)− P (A ∩B).

The probability of A or B equals the probability of A plus the probability of Bminus the probability of both A and B happening. Notice that if the events aremutually exclusive, P (A ∩B) = 0 so P (A ∪B) = P (A) + P (B). If the events areindependent the joint probability, i.e. the probability of both happening; is

P (A ∩B) = P (A)× P (B).

The probability of A happening given that event B has happened is called aconditional probability and is given by:

P (A | B) =P (A ∩B)

P (B). (10.1)

Below we will calculate the probability of winning the jackpot in the lottery.Strictly this is a conditional probability: the probability of an event A (winningthe jackpot), given event B (buying a lottery ticket). Winning the jackpot andnot buying a ticket are mutually exclusive events. Conditional probabilities playa very important role in decision making. They tell you how the information thatB happened changes your estimate of the probability of A happening. If A andB are independent P (A | B) = P (A), knowing that B happened does not changethe probability of A happening. Similarly, the probability of B happening giventhat A happens is:

P (B | A) =P (A ∩B)

P (A). (10.2)

Multiply both sides of (10.1) by P (B) and both sides of (10.2) by P (A), andrearrange to give

P (A ∩B) = P (A | B)P (B) = P (B | A)P (A)

73

Page 74: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

the joint probability is the product of the conditional probability and the marginalprobability in each case. Using the two right hand side relations gives BayesTheorem:

P (A | B) =P (B | A)P (A)

P (B).

This formula is widely used to update probabilities of an event A, in the light ofnew information, B. In this context P (A) is called the prior probability of A,P (B | A) is called the likelihood, and P (A | B) is called the posterior probability.Statistics tends to divide into the approach this course emphasises, called

classical or frequentist, and Bayesian statistics. A very readable account of thedevelopment of Bayesian statistics is Sharon Bertsch McGrayne, The theory thatwould not die: how Bayes rule cracked the enigma code, hunted down Russian sub-marines & emerged triumphant from two centuries of controversy, Yale UniversityPress 2011.

10.2.1. Example: Screening

There is considerable controversy about the value of screening for diseases likebreast cancer, prostate cancer or HIV. These are cases where the disease is notapparent but may be revealed by a test, which may not always give the correctanswer. Call D having the disease, N not having the disease, with probabilitiesP (D) and P (N) where P (D) + P (N) = 1. Call TP the test showing positive(suggesting that you have the disease), TN the test showing negative (suggest-ing that you do not have the disease) with P (TP ) + P (TN) = 1. In medicalterminology the probability of the test showing positive given that you have thedisease P (TP | D) is called the sensitivity of the test. The probability of the testshowing negative given that you do not have the disease P (TN | N) is called thespecificity of the test. Both are about 0.9 for a mamogram for breast cancer, andthe higher they are the better. The probability of testing positive when you donot have the disease P (TP | N) = 1 − P (TN | N) is called the probability of afalse positive. The probability of testing negative when you do have the disease,P (TN | D) = 1− P (TP | D) is the probability of a false negative. The two sortsof error, called false positives and false negatives here, are given different name indifferent contexts.QuestionSuppose there is a disease which 1% of the population suffer from P (D) = 0.01.

There is a test which is 99% accurate, i.e. 99% of those with the disease test

74

Page 75: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

positive and 99% of those without the disease test negative: P (TP | D) = P (TN |N) = 0.99. Suppose you test positive, what is the probability that you do havethe disease?AnswerIt is often simpler and clearer to work with numbers rather than probabilities

and present the results as numbers. This is also often more useful for non-specialistaudiences. Imagine a population of one hundred thousand: 100,000. Then athousand (1000 = 0.01× 100, 000) have the disease and 99,000 are healthy. Ofthose with the disease, 990 (0.99× 1000) test positive, 10 test negative. Of thosewithout the disease, 990 (0.01× 99, 000) also test positive, 98010 test negative.Of the 2 × 990 = 1980 people who test positive, half have the disease, so theprobability of having the disease given that you tested positive is 50%. Thus youshould not worry too much about a positive result. A negative result is reassuringsince only 10 out of 98020, who test negative have the disease. Positive resultsare usually followed up with other tests, biopsies, etc.We could represent the joint and marginal frequencies as a table.

D NTP 990 990 1,980TN 10 98,010 98020

1,000 99,000 100,000We could also calculate the conditional probability directly using Bayes The-

orem and noting that P (TP ) = P (TP | D)P (D) + P (TP | N)P (N),

P (D | TP ) =P (TP | D)P (D)

P (TP )=

0.99× 0.01

0.99× 0.01 + 0.01× 0.99= 0.5

In practice screening is confined to groups where P (D) is high to avoid this prob-lem. The decision to establish a screening program depends on a judgement ofthe balance between (a) the benefits of detecting the disease, for instance if earlytreatment saves lives, (b) the costs of false positives such as inappropriate treat-ment, worry etc. and (c) the cost of testing, including time off work to take thetest. Since people disagree about these costs and benefits, screening is contro-versial. For instance, the PSA test detects prostate specific antigen. The BritishMedical Association say "two-thirds of men with high PSA do not have prostatecancer, some men with prostate cancer do not have high PSA and no evidenceexists to show whether treating localised prostate cancer does more harm thangood". There is a profitable private industry in screening tests.

75

Page 76: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

10.2.2. Example Innovation

Suppose that there is a survey of 1, 000 firms. Of these 500 report introducinga new product in the previous year, 400 report introducing a new productionprocess and 350 report having introduced both a new product and a new process.(a) What is the probability that a firm has done no innovation: neither intro-

duced a new product nor a new process?(b) Are the probabilities of introducing a new product and a new process

independent?(c) What is the conditional probability of introducing a new product given

that the firm introduced a new process?(d) What is the conditional probability of introducing a new process given that

the firm introduced a new product?Answers(a) There were 550 innovators: 350 did both, 150 just product, 50 just process.

Thus 450 did not innovate so the probability of not innovating was 0.45 = 1 −(0.5 + 0.4− 0.35). Formally if event A, is make a product innovation, P (A) = 0.5;event B, make a process innovation, P (B) = 0.4. and the probability of doingboth, P (A ∩B) = 0.35. For the event not making an innovation, P (N)

P (N) = 1− P (A ∪B) = 1− (P (A) + P (B)− P (A ∩B))

Notice the categories innovator, 550, and non-innovator 450 are mutually exclu-sive, the probability of being both an innovator and a non-innovator is zero bydefinition.(b) If they were independent the product of the probability of product inno-

vation times the probability of process innovation would give the probability ofdoing both: P (A)P (B) = P (A ∩ B). In this case 0.5 × 0.4 = 0.2 which is muchless than 0.35, so they are not independent. You are more likely to do a secondtype of innovation if you have already done one type.(c) The probability of doing product innovation conditional on process inno-

vation is the probability of doing both divided by the probability of doing process

P (A | B) =P (A ∩B)

P (B)=

0.35

0.4= 0.875

87.5% of process innovators also introduce a new product.

76

Page 77: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

(d) The probability of doing process conditional on doing product is the prob-ability of doing both divided by the probability of doing product:

P (B | A) =P (A ∩B)

P (A)=

0.35

0.5= 0.7

70% of product innovators also introduce a new process. Notice that the answersto (c) and (d) are different.

10.2.3. Background Example The Lottery

In the UK lottery six numbers are randomly chosen from 49 possible numbers.Over the long-run the expected value of playing the lottery is −55p : you paythem £ 1 and they pay out 45p in prizes for every pound they take in. The 55pyou lose, on average, goes on tax, good causes and their costs and profits. Youwin the jackpot if you match all six numbers, though not necessarily in the orderdrawn. Order the numbers from the smallest to the largest. For the first numberyou chose there are six chances of getting it (six draws). So the probability ofyour first number coming up is 6/49. To get your second number, you only havefive chances (your first number has been drawn leaving 48 remaining numbers),so it is 5/48. Similarly the third is 4/47, fourth is 3/46, fifth is 2/45, sixth is 1/44.The probability of getting all 6 is the product of these probabilities(

6

49

)(5

48

)(4

47

)(3

46

)(2

45

)(1

44

)=

720

10068347520

this is a 1 in 13, 983, 816 chance, 1 in 14 million. Notice that low probability eventsare not necessarily rare, it depends on the population exposed to them. Winningthe jackpot is a low probability event for any particular person, but it happens tosomeone almost every week. Always check the time horizon that the probabilityapplies to. Someone shouting “we are all going to die”is not very worrying, sincethat is certainly true eventually, though if they mean in the next five minutes, itmay be more worrying.The usual formula for calculating the lottery is the number of ways in which a

group of r objects (in this case 6) can be selected from a larger group of n objects(in this case 49) where the order of selection is not important. It is just the inverseof the formula above.

nCr =n!

r!(n− r)! =49× 48× 47× 46× 45× 44

6× 5× 4× 3× 2× 1

77

Page 78: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

The expected value of any particular game depends on whether the jackpothas been increased by being rolled over from previous games where it was notwon. Even if the jackpot is over £ 14m, the expected value may not be positive,because you may have to share the jackpot with other winners who chose thesame number, (unless you are a member of a gang that bought all the availabletickets and made sure nobody else could buy any tickets). Choosing an unpopularnumber, that others would not choose, will not change the probability of winningbut may increase the probability of not having to share the jackpot. For instance,people sometimes use birthdays to select numbers, so do not choose numbers over31. You can choose to buy random numbers to avoid this problem. Optimaldesign of lotteries raises interesting economic questions.

10.2.4. Background Example: Hit and Run

Two cab companies operate in a city, 85% are green, 15% are blue. A cab hita pedestrian at night and drove away. The person who had been hit said theythought the cab was blue. Subsequent tests showed that the person could correctlyidentify the color of a cab at night 80% of the time. What is the probability thatthe person was hit by a blue cab?Answer.We know the proportion of blue B and green G cabs are P (B) = 0.15, P (G) =

0.85. We know that the probability of the person reporting that it is blue RBgiven that it is blue is P (RB | B) = 0.8 and from this the probability of wronglyreporting that it is blue P (RB | G) = 0.2.What we need to know is the probabilitythat it was blue given that they report it is blue P (B | RB). The probability ofthe person reporting a blue cab is the probability of them seeing a blue cab timesthe probability of reporting it as blue plus the probability of seeing a green cabtimes the probability of wrongly reporting the cab as blue:

P (RB) = P (B)P (RB | B)+P (G)P (RB | G) = 0.15×0.8+0.85×0.2 = 0.29.We have all the terms needed to apply Bayes Theorem

P (B | RB) =P (RB | B)× P (B)

P (RB)=

0.8× 0.15

0.29= 0.41

The report that the cab was blue increases the probability that the cab wasblue from the unconditional prior probability of 0.15 to the conditional posteriorprobability of 0.41, but it is still a lot less than 0.8.In this case we knew the prior probabilities, the proportion of blue and green

cabs, that we used to adjust the report. In other cases where people report events

78

Page 79: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

we do not know the prior probabilities, e.g. when 15% of people in Californiareport having being abducted by aliens.

10.2.5. Background Example: Lost H bomb

In 1966 a B52 crashed with an air-tanker while refueling at 30,000 feet off thecoast of Palomares Spain, losing its four H bombs. Three were recovered quickly,the fourth was somewhere on the seabed. The US Navy constructed a map of theseabed, then got a group of various experts to bet on different scenarios that mighthave happened, (e.g. the bomb had two parachutes, the scenarios might be (i)both opened, (ii) one opened, (iii) none opened). Each scenario left the weaponin a different location. They then used Bayes theorem to combine the expertsdifferent subjective estimates of the probability (derived from the bets) to workout the (posterior) probability of the bomb being at each location. The highestprobability location was far from where the other three bombs or the wreckage ofthe B52 were found. Fortunately the bomb was there. This account comes fromBlind Man’s Buff S. Sontag and C. Drew, Harper Paperbacks, 1999, which givesvarious other examples of the use of Bayes theorem in submarine warfare, thoughsome of the details are disputed by McGrayne cited above in 10.2.

11. Discrete Random Variables

Above we dealt with events where the outcomes are uncertain, now we want toconsider how we apply probability to variables where we are uncertain what valuesthey will take. These are called random variables. Forecasting involves estimatingfuture values of random variables and should provide not only an estimate “ourcentral forecast of CPI inflation in two years is 2.0%”, but also an indication ofthe likely uncertainty “and we are 90% certain that it will lie between 1.0% and3.0%”. Inflation is a continuous random variable, it can take any value. We willbegin with discrete random variables. Barrow discusses Random variables at thebeginning of chapter 3.A discrete random variable, X can take a number of distinct possible values,

say x1, x2, .., xN . with probabilities p1, p2, ..., pN . The observed values are calledthe realisations of the random variable. For instance, X the total obtained fromthrowing two dice is a discrete random variable. It can take the values 2 to 12.After you throw the dice, you observe the outcome, the realisation, a particularnumber, xi. Associated with the random variable is a probability distribution,

79

Page 80: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

pi = f(xi), which gives the probability of obtaining each of the possible outcomesthe random variable can take. The cumulative probability distribution,

F (xj) =

j∑i=1

f(xi) = P (X ≤ xj)

gives the probability of getting a value less than or equal to xj. So in the dicecase:

xi f(xi) F (xj)1 0 02 1/36 1/363 2/36 3/364 3/36 6/365 4/36 10/366 5/36 15/367 6/36 21/368 5/36 26/369 4/36 30/3610 3/36 33/3611 2/36 35/3612 1/36 36/36Make sure that you can calculate all the probabilities, use the 6x6 grid in

section 10.1 if necessary. Notice f(1) = 0, it is impossible to get 1, and F (12) = 1,you are certain to get a value less than or equal to 12. f(7) = 6/36, because thereare six different ways of getting a 7: (1,6), (6,1), (2,5), (5,2), (3,4), (4,3). Theseare the diagonal elements (running from bottom left to top right) in the grid abovein section 10.1.

∑f(xi) = 1. This is always true for a probability distribution.

This probability distribution is symmetric with mean=median=mode=7.The mathematical expectation or expected value of a random variable (often

denoted by the Greek letter mu) is the sum of each value it can take, xi, multipliedby the probability of it taking that value pi = f(xi) :

E(X) =

N∑i=1

f(xi)xi = µ. (11.1)

The expected value of the score from two throws of a dice is seven; calculated as

7 = 2× 1

36+ 3× 2

36+ 4× 3

36.....+ 12× 1

36.

80

Page 81: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

If all the values are equally likely, f(xi) = 1/N , so the expected value is thearithmetic mean.The variance of a random variable is defined as

V (X) = E(X − E(X))2 =

N∑i=1

f(xi)(xi − µ)2 = σ2. (11.2)

If f(xi) = 1/N this is just the same as the population variance we encounteredin descriptive statistics, section 6.1.2. This is the same formula that we used insection 6.2 with f(xi) = pi. In the dice example, the variance is 5.8 and thestandard deviation 2.4.Suppose that there are two random variables X and Y with individual (mar-

ginal) probabilities of f(xi) and f(yi) and joint probabilities f(xi, yi). The jointprobability indicates the probability of both X taking a particular value, xi, andY taking a particular value, yi, and corresponds to P (A ∩ B) above. So if X isthe number on the first dice and Y is the number on the second dice

f(6, 6) = P (X = 6 ∩ Y = 6) = 1/36

If the random variables are independent, then the joint probability is just theproduct of the individual probabilities as we saw above

f(xi, yi) = f(xi)f(yi)

and if they are independent, the expected value of the product is the product ofthe expected values

E(XY ) = E(X)E(Y ).

Expected values behave like N−1∑. So if a is a constant E(a) = a. If a and b are

constants E(a+ bxi) = a+ bE(xi).The Covariance between two random variables is

Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))].

If f(xi) = 1/N this is

Cov(X, Y ) =1

N

N∑i=1

(xi − x)(yi − y).

as we saw in section 6.1.3. If the random variables are independent the covarianceis zero. However, a covariance of zero does not imply that they are independent,independence is a stronger property.

81

Page 82: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

11.0.6. Background Example: churn

On average a customer is worth £ 20 a month to a mobile phone company. Butthe churn rate (the probability of a customer ceasing to subscribe) is 5% a month.The company’s discount rate is 2% a month. The company wants to know howmuch it can subsidise handsets, while keeping the present value of a customerpositive. What is expected present value of a customer?AnswerCall the value per month V, the churn rate p, and the discount rate r. Every

month a proportion of (1− p) continue to the next month. The present value is

PV =∞∑t=0

V (1− p)t(1 + r)t

= V∞∑t=0

((1− p)(1 + r)

)t= V

(1− (1− p)

(1 + r)

)−1= V

(1 + r − (1− p)

1 + r

)−1=

(1 + r)V

r + p

since this is the sum of a geometric progression in (1− p)/(1 + r) . On these num-bers

PV =1.02× 20

0.07= 291

a customer is worth £ 291.

12. Continuous random variables

Whereas a discrete random variable can only take specified values, xi, i = 1, 2, ..., N ;continuous random variables x (e.g. inflation) can take an infinite number ofvalues. Corresponding to the probabilities f(xi) for discrete random variablesthere is a probability density function, pdf, denoted f(x) for continuous randomvariables. Corresponding to the discrete cumulative distribution function, cdf,F (xi) = Pr(X ≤ xi), there is a continuous cdf , F (x) = Pr(X ≤ x) which givesthe probability that the random variable will take a value less than or equal to aspecified value x. The Bank of England publishes its estimate of the probabilitydensity function for inflation as a fan chart. Since there are an infinite number of

82

Page 83: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

points on the real line, the probability of any one of those points P (X = x) is zero,although the pdf will be defined for it, f(x) 6= 0. But we can always calculatethe probability of falling into a particular interval, e.g. that inflation will fall intothe range 1.5% to 2.5%. In the definitions of expected value and variance for acontinuous random variable we replace the summation signs in (11.1) and (11.2)for the discrete case by integrals so

E(X) =

∫xf(x)dx = µ

V (X) = E(X − E(X))2 =

∫(x− µ)2f(x)dx = σ2.

12.1. Uniform Distribution

The simplest continuous distribution is the uniform The probability density func-tion takes equal values over some range (support) a to b. It is zero, f(x) = 0outside the range and f(x) = 1/(b − a) within the range. By doing the inte-gration you can show the mean of a uniform random variable, E(x) = (a + b)/2and its variance is V ar(x) = (b − a)2/12. Thus if the range was a = 0 to b = 1E(x) = 0.5, V ar(x) = 1/12, and the standard deviation of x is 0.29. Noticein this case f(x) = 1 over the range of x, but the probabilities sum to unity∫f(x)dx = 1, since the graph of f(x) has height 1, and length 1, so area 1. The

uniform distribution is used in section 14.

12.2. The normal distribution

The most common distribution assumed for continuous random variables is thenormal or Gaussian distribution. This has a bell shape.One source of normality comes from the central limit theorem. This says that

the distribution of the sample mean will be approximately normal, whatever thedistribution of the original variable and that this approximation to normality willget better the larger the sample size. This is demonstrated numerically in section14, where random numbers are drawn from a uniform distribution, but the meanof many samples is normally distributed.The normal distribution is completely defined by a mean (first moment) and

variance (second centred moment), it has a coeffi cient of skewness (third standard-ised moment) of zero and a coeffi cient of kurtosis (fourth standardised moment)of three. see section 6. The standard deviation is the square root of the variance.

83

Page 84: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

For a normal distribution roughly two thirds of the observations lie within onestandard deviation of the mean and 95% lie within two standard deviations of themeans.Many economic variables, such as income or firm size, are not normally dis-

tributed but are very skewed and not symmetrical. However, the logarithm ofthe variable is often roughly normal. This is another reason we often work withlogarithms of variables in economics.Suppose that we have a random variable Y ∼ N(α, σ2) which is said "Y

normally distributed with expected value α and variance σ2, where

V (Y ) = E(Y − E(Y ))2 = E(Y − α)2 = σ2

If we have an independent sample from this distribution, Yi; i = 1, 2, ..., n we writethis Yi ∼ IN(α, σ2), which is said Yi is independent normal with expected valueα and variance σ2. The expected value of a normal distribution is often denotedµ, but we use α to establish a link with regression below.If one variable Y ∼ N (α, σ2) then any linear function of Y is also normally

distributed.X = a+ bY ∼ N

(a+ bα, b2σ2

)(12.1)

It is b2 because the variance is a squared concept, X has standard deviationbσ. So if temperature over a year measured in centigrade is normally distributed,temperature in Farenheit (which is a linear transformation) is also normal.Using this we can write

Yi = α + ui

where ui = Yi −E(Yi) = Yi − α, and from the rules ui ∼ N (0, σ2) . Decomposingan observed random variable into its expected value and an error, ui, is veryconvenient for many purposes.An important linear function of Y is

zi =Yi − ασ

∼ N(0, 1)

This is called the standard normal, has expected value zero and variance (andstandard deviation) of one (like any standardised variable) and is tabulated inmost statistics and econometrics books. Barrow Table A2 gives the table of 1 −F (z) = P (Z > z) for values of z > 0. So from the table in Barrow P (Z > 0.44) =0.33. Read down the first column till 0.4 and then go across the row to the 0.04

84

Page 85: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

column. Since the normal distribution is symmetric P (Z > z) = P (Z < −z). SoP (Z < −0.44) = 0.33 also.The standard normal is useful because we can always convert from Y to z

using the formula above and convert from z back to Y using

Yi = σzi + α.

The distribution has a bell shape and is symmetric with mean=median=mode.The formula for the normal distribution is

f(yi) =1√

2πσ2exp−1

2(yi − ασ

)2

f(zi) = (2π)−1/2 exp−z2i

2

where zi = (yi−α)/σ is N (0, 1) standard normal. The normal distribution is theexponential of a quadratic. The (2πσ2)

−1/2 makes it integrate (add up) to unity,which all probability distributions should.

12.2.1. Areas under a normal distribution

For a standard normal distribution, with mean zero and standard deviation one,the probability that the random variable Z is less than a specific value z is givenbelow for various values of z. Note we give P (Z < z) for values of z > 0 whereasBarrow Table A2 gives P (Z > z) for values of z > 0. There is no standard way topresent areas under the normal distribution, so check how the table you are usingpresents it.

z P (Z < z)0 0.5

0.5 0.69151 0.8413

1.5 0.93322 0.9772

Since the normal distribution is symmetric, the probability of being less thanthe mean, (corresponding to z = 0) is 0.5, the same as the probability of beinggreater than the mean. There is an 84% chance, of getting a value less thanthe mean plus one standard deviation, z = 1. The chance of being within onestandard deviation of the mean is P (−1 < Z < +1) = 0.6826=0.8413-(1-0.8413).There is a 16% (1-0.84) chance of being less than one standard deviation below

85

Page 86: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

the mean, and a 16% chance of more than one standard deviation above themean. The chance of being more than two standard deviations from the meanis 0.0456=2(1-0.9772), roughly 5%. Strictly 95% of the normal distribution lieswithin 1.96 standard deviations from the mean, but 2 is close enough for mostpractical purposes.

12.2.2. Example; test scores

Suppose that a maths class is made up of an equal number of Blue and GreenStudents. Within each group marks are distributed normally, but blues are betterat maths with a mean of 60 compared to a mean of 55 for green students. Bluestudents are also less erratic with a standard deviation of 5 compared to a standarddeviation of 10 for green students.(a)What proportion of blue students get more than 70?(b) What proportion of green students get more than 70?(c) Of those who get over 70 what proportion are green and what proportion

are blue?AnswerWe have B ∼ N(60, 52), G ∼ N(55, 102)(a) We want to find the probability that the mark is over 70. For Blue students

z =70− 60

5= 2

so the probability of a mark over 70 is P (Z > 2) = 1− P (Z < 2) = 1− 0.9772 =0.0228 or 2.28%. The 0.9772 came from the table of areas under a normaldistribution.(b) For Green Students

z =70− 55

10= 1.5

so the probability of a mark over 70 is P (Z > 1.5) = 1 − P (Z < 1.5) = 1 −0.9332 = 0.0668 or 6.68%. The 0.9332 came from the table of areas under anormal distribution.(c) In a large class with equal number of blue and green students, 4.48%

of all students, (2.28+6.68)/2, would get over 70. The proportion of thosethat are blue is 25% (=2.28/(2.28+6.68)), the proportion that are green is 75%(=6.68/(2.28+6.68)).Even though the question says blues are better at maths and it is true that

their average is higher, three quarters of the top group in maths are green (as

86

Page 87: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

are three quarters of the bottom group). The lesson is to think about the wholedistribution, not just the averages or parts of the distribution (e.g. the top of theclass), and try not to be influenced by value-laden descriptions: ‘better’or ‘lesserratic’.

12.3. Distributions related to the normal

Barrow chapter 6 discusses the χ2 t and F distributions which are widely usedfunctions of normally distributed variables. Suppose we have a sample i =1, 2, ..., n of standardised data zi˜ IN(0, 1), independently distributed, standardnormal.

12.3.1. Chi-squared

Suppose we form the sum of squares

A =n∑i=1

z2i ∼ χ2(n)

Then A is said to have a Chi squared distribution with n degrees of freedom.Notice the Chi squared is only defined over positive values. As well as beingthe number of observations less the number of parameters estimated, degrees offreedom are a parameter of the distribution. The normal distribution has twoparameters, which determine its shape, the mean and the variance. The meandetermined its centre and the variance determined its spread. The Chi-squareddistribution has one parameter, its degrees of freedom, that determines its shape.Its expected value equals its degrees of freedom; its variance equals twice itsdegrees of freedom. For small degrees of freedom the Chi squared distribution isskewed, for large degrees of freedom it approaches the normal. It arises naturallybecause we estimate the variance as s2 =

∑ni=1(xi−x)2/(n− 1), so if xi is normal

(n− 1)s2/σ2 has a χ2 distribution with n− 1 degrees of freedom.

12.3.2. t distribution

A standard normal divided by the square root of a Chi-squared distribution, di-vided by its degrees of freedom, is called the t distribution with n degrees offreedom

t(n) = z/

√χ2(n)

n

87

Page 88: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

We often divide an estimate of the mean or a regression coeffi cient (which are nor-mally distributed from the central limit theorem) by their standard errors (whichare the square root of a χ2 divided by its degrees of freedom) and this is the for-mula for doing this. The t distribution has fatter tails than the normal, but as thesample size gets larger, about 30 is big enough, the uncertainty due to estimatingthe standard error becomes small and the distribution is indistinguishable froma normal. It is sometimes called the Student’s t distribution. W.S. Gosset, whodiscovered it, worked for Guiness and because of a company regulation had topublish it under a pseudonym, and he chose Student.

12.3.3. F distribution

Fisher’s F distribution is the ratio of two independent Chi-squared divided bytheir degrees of freedom.

F (n1, n2) =χ2(n1)/n1χ2(n2)/n2

.

It arises as the ratio of two variances.

13. Economic and Financial Data II: interest rates, growthrates, exchange rates, etc

13.1. Interest rates

There are a set of basic rules that apply to a series of different rates: the rateof interest, the rate of return, the rate of inflation, the rate of growth of GDPor other variables. We will use rates of interest or return as an example, but thesame rules apply to the others. Suppose we invest £ 100, in 2000, the value of theasset rises to £ 110 in 2001, £ 121 in 2001, 133.1 in 2002, etc. We can write this

V0 = 100 V1 = 110 V2 = 121 V3 = 133.1 .For other examples, V might be GDP (growth rates), a price index (inflation),

etc. The gross rate of return in the first year is (1 + r1) = V1/V0 = 1.1. The(net) rate of return in the first year is r1 = (V1 − V0)/V0 = V1/V0 − 1 = 0.1,the percentage rate of return is 100r1 = 10%. Be aware of the difference betweenproportionate, 0.1 and percentage, 10%, rates of interest and return. Interest ratesare also often expressed in basis points, 100 basis points is one percentage point.

88

Page 89: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

From the definition of the gross return, (1 + r1) = V1/V0, we can write V1 =(1 + r)V0. The rate of return in this example is constant at 10%, ri = r = 0.1.Check this by calculating r2 and r3. The value of the investment in year 2 is

V2 = (1 + r)V1 = (1 + r)2V0

and for year tVt = (1 + r)tV0. (13.1)

Notice how interest compounds, you get interest paid on your interest. Interestrates are often expressed at annual rates, per annum, p.a., even when they are forshorter or longer periods. If interest at 10% p.a. was paid out quarterly duringthe year, you would get 2.5% a quarter, not 10% a quarter. However it would bepaid out four times as often so the formula would be Vt = (1 + r/4)4tV0 or if it ispaid out n times a year Vt = (1 + r/n)ntV0.As n→∞, continuous compounding,this converges to

Vt = ertV0. (13.2)

The irrational number e ≈ 2.718 seems to have been discovered by Italian bankersdoing compound interest in the late middle ages. Since lnVt = rt + lnV0 thecontinuously compounded return is

d lnV

dt=

1

V

dV

dt= r.

For discrete data this can be calculated as

r = (lnVt − lnV0) /t.

The return in any period is often calculated as rt = lnVt − lnVt−1.Notice that the discrete version (13.1) is strictly

r = exp(lnVt − lnV0/t)− 1

In addition

lnVt − lnVt−1 = ln

(VtVt−1

)= ln(1 + rt) ≈ rt

if r is small, another justification for using the difference of the logarithms. Growthrates and inflation rates are also calculated as differences in the logarithms. Mul-tiply them by 100 if you want percentage rates.

89

Page 90: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

There are also interest rates at various maturities, depending on how long themoney is being borrowed or lent. The pattern of interest rates with respect tomaturity is called the term structure of interest rates or yield curve. Typicallythe term structure slopes upwards. Long-rates, interest rates on money borrowedfor a long period of time, such as 10 year government bonds, are higher thanshort rates, money borrowed for a short period of time, such as 3 month TreasuryBills. Interest rates are usually expressed at annual rates, whatever the lengthof the investment. When monetary policy is tight, the term structure may slopedownwards, the yield curve is inverted: short-rates are higher than long-rates.This is often interpreted as a predictor of a forthcoming recession. Monetarypolicy is operated by the Central Bank through the control of a short overnightinterest rate called the policy rate, Repo rate, Bank Rate or in the US FederalFunds Rate. Usually other short rates such as LIBOR (London Inter-Bank OfferRate, the rate at which banks lend to each other) are very close to the policyrate. However, during the credit crunch starting in August 2007 policy rates andLIBOR diverged: Banks required a risk premium to lend to other banks andLIBOR was described as the rate at which banks were unwilling to lend to eachother. Evidence that LIBOR was manipulated in the interests of the banks led toreform of the process of setting it.

13.2. Exchange Rates

13.2.1. Spot and forward

The spot exchange rate is the rate for delivery now: the exchange takes placeimmediately. The spot exchange rate is usually quoted as domestic currency perunit of foreign currency, with the dollar being treated as the foreign currency:Swiss Francs per Dollar for instance. A rise indicates a depreciation in the SwissFranc: more Swiss Francs are needed to buy a dollar. Some are quoted as foreigncurrency per unit domestic, in particular Sterling, which is quoted Dollars perPound. In this case a rise indicates an appreciation of the Pound, a pound buysmore dollars. Forward rates are for delivery at some time in the future. The oneyear forward rate is for delivery in a years time when the exchange takes place ata rate quoted and agreed upon today.

90

Page 91: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

13.2.2. Covered interest parity

Suppose that you have a $1 million and know that you are going to need Sterling ina years time. You can either convert the Dollars into Sterling now giving you £ 1/Sm, where S is the spot rate, (£ 0.548 m at the spot rate on 27/7/4 of 1.8245$/£ ).You can invest this for a year at UK interest rates R£ , 5.3375% (one year £ Libor(London Inter Bank Offer Rate) on the same day). Thus in a years time you willhave £ (1+R)/S m (£ 0.577m). Alternatively you could invest the Dollars in theUS getting rate R$, 2.43% ($ Libor) so you will have $(1+R$) m ($1.0243m) ina years time. You can sell these dollars for delivery in a year at forward rate F,(1.7752). This means that in a years time you will have £ (1+R$)/F m (£ 0.577m).These are both risk free (assuming the bank repays your deposit, i.e. no defaultor counterparty risk) so the returns must be equal, as they are. You get £ 577,000by either route. These rates are given every day in the FT. Expressing it as aformula

1 +R$

S=

1 +R$

FF

S=

1 +R$

1 +R$

Or approximately (F − S)/S ≈ lnF − lnS ≈ R$ − R$, where the approxima-tions are fairly good for small differentials. The term on the left is the forwardpremium, when positive or discount when negative and the term on the rightthe interest differential. This relation is called covered interest parity and fol-lows from arbitrage. If it did not hold, banks could make a riskless return byexploiting the difference. Notice that the forward rate on 27/7/4 indicated thatthe market expected sterling to fall in value over the next year. In fact becauseit is determined by the interest rate differential, the forward rate tends to be apoor predictor of the future spot rate. In this case the rate on 29/7/5 was $1.759,so the forward rate was quite a good predictor. Verbeek section 4.11 gives someempirical examples of the use of these relationships.

13.2.3. Real and effective rates

Suppose a Big Mac costs $X in the US and £Y in the UK and the spot rate is S,e.g. 1.6$/£ . Then the relative price of Big Macs in the UK and US is dollars

QM = ($Y × S)/$X

91

Page 92: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

this is the Big Mac real exchange rate, which the Economist publishes. Similarcalculations are done for the whole economy using price indexes, to calculate thereal exchange rate for the country: Q = P ∗S/P, where P ∗ is a foreign price indexand P a domestic price index. Purchasing Power Parity (PPP) says that tradewill equalise prices between countries and the real exchange rate will be constant,in the long run. As Keynes said ‘In the long run, we are all dead’and deviationsfrom PPP can be very persistent.Any currency has a large number of exchange rates with all the other curren-

cies, so ‘trade weighted averages’or effective exchange rates are often calculated,which give an average exchange rate with all the other currencies, the weightsreflecting their relative importance in trade. Suppose that we denote the baseyear as year zero, e.g. 2000, then the index in year t is

It =∑

wi(SitSi0

)

where the wi are the percentage shares of trade with country i so that∑wi = 100,

Sio is the exchange rate with country i in the base year and Sit is the exchangerate in year t. The value of the index in year zero, I0 = 100.

14. Applied Exercise II: Sampling distributions

The remainder of the notes are concerned with estimation and hypothesis testing.These rest on the idea of a sampling distribution, the particular sample we takeis just one of a large number of possible samples that we might have taken. Thisexercise is designed to illustrate the idea of a sampling distribution. You do notneed any data for this exercise, you create it yourself.Go into Excel and in cell A1 type =RAND( ). This generates a random number

uniformly distributed over the interval zero-one. Each number over that intervalhas an equal chance of being drawn. Copy this cell right to all the cells till O1.In P1 type =AVERAGE(A1:O1). In Q1 type =STDEVP(A1:O1). In R1 type=STDEV(A1:O1). Copy this row down to line 100.You now have 100 samples of 15 observations from a uniform distribution and

100 estimates of the mean and 100 estimates for each of two estimators of thestandard deviation. An estimator is a formula which tells you how to calculate anestimate from a particular sample; an estimate is the number that results fromapplying the estimator to a sample.

92

Page 93: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

We can then look at the sampling distribution of the mean and standarddeviation. Calculate and draw the histogram for the 100 estimates of the mean.Do the same for the two estimators of the standard deviation. What do you thinkof their shapes? Are they close to a normal distribution? Go to P101, type in=AVERAGE(P1:P100). Go to P102 type in =STDEV(P1:P100). Go to P103type in =SKEW(P1:P100). Go to P104 type in KURT(p1:p100). Copy these tothe Q and R columns to give the descriptive statistics for the two estimates of thestandard deviation.If x is uniformly distributed over [a, b] then E(x) = (a + b)/2 and V ar(x) =

(b − a)2/12. In this case a = 0, b = 1, so the theoretical mean should be 0.5(compare this with the number in cell P101) and the theoretical variance 1/12,with standard deviation

√1/12 = 0.288675 (compare this with the number in

Q101, which should be biased downwards and with the number in R101, whichshould be closer). The standard error of the mean from a sample of size N (in thiscase 15) is

SE(x) =√V ar(x)/N

so we would expect the standard error of our distribution of means to be 0.07453(compare this with the number in P102). As N becomes large (the number of ob-servations in each sample, 15 in this case which is not very large), the distributionof the mean tends to normality (the central limit theorem). Do the measures ofskewness and excess kurtosis given in P103 and P104 suggest that these meansare normal? The values should be close to zero for normality. Is the mean morenormally distributed than the standard deviation?What we have done here is called a ‘Monte Carlo’simulation. We have ex-

amined the properties of the estimators by generating lots of data randomly andlooking at the distributions of the estimates from lots of samples. In practice, 100replications of the sample is rather small, in Monte Carlo studies many thousandsof replications are typically used. Because 100 is quite small, you will get ratherdifferent answers (e.g. for the overall mean) from me. However, by making thenumber of replications suffi ciently large, we could make the difference betweenyou and me as small as we like (law of large numbers).In this case, we did not need to do a Monte Carlo because we can derive the

properties of the estimators theoretically. But for more complicated problems thisis not the case and we must do it numerically like here. However, doing the MonteCarlo gives you a feel for what we mean when we discuss the distribution of anestimator.

93

Page 94: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

15. Estimation

15.1. Introduction

In the first part of the course we looked at methods of describing data, e.g. usingmeasures like the mean (average) to summarise the typical values the variabletook. In the second part of the course, we learned how to make probabilitystatements. Now we want to put the two together and use probability theory tojudge how much confidence we have in our summary statistics. The frameworkthat we will use to do this is mathematical, we will make some assumptions andderive some results by deduction. Chapter 4 and 5 of Barrow covers these issues.There are a number of steps.

1. We start with a model of the process that generates the data. For instance,the effi cient market theory says that the return on a stock in any period t,Yt = α + ut Where Yt is the return, which we can observe from historicaldata, α is the expected return, an unknown parameter, and ut is an unpre-dictable random error that reflects all the new information in period t. Wemake assumptions about the properties of the errors ut. We say that theerror ut is ‘well behaved’when it averages zero, E(ut) = 0; is uncorrelatedthrough time E(utut−1) = 0; and has constant variance, E(u2t ) = σ2.

2. We then ask how we can obtain an estimator of the unknown parameter α.An estimator is a formula for calculating an estimate from any particularsample. We will use two procedures to choose an estimator α, (said alphahat) of α, that gives Yt = α+ ut : (1) method of moments, which chooses theestimator that makes our population assumptions, e.g. E(ut) = 0, hold inthe sample so N−1

∑ut = 0 (2) least squares, which chooses the estimator

that has the smallest variance and minimises∑u2t . We will use these two

procedures in three cases: (1) expected values (2) bivariate regression (3)multiple regression. In these three cases the two procedures happen to giveidentifical estimators, but this is not generally true.

3. We then ask how good the estimator is. To do this we need to determinewhat the expected value of the estimator is and the variance of the esti-mator, or its square root: the standard error. We then need to estimatethis standard error. Given our assumptions, we can derive all these thingsmathematically and they allow us to determine how confident we are in ourestimates. Notice the square root of the variance of a variable is called its

94

Page 95: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

standard deviation, the square root of the variance of an estimator is calledits standard error.

4. We then often want to test hypotheses. For instance, from Applied Exer-cise I, we could find that the mean real return on equities over the period1991-1999 was much higher and less volatile than the historical return over1872-1990. In the year 2000, you might have wanted to ask whether therereally was a New Economy, with significantly higher returns (over twice thehistorical average) and lower risk (a lower standard deviation); or whetheryou might just get the numbers observed during the 1990s, purely by chance.See section 16.1.

5. Since our mathematical derivations depend on our assumptions, we need tocheck whether our assumptions are true. Once we have estimated α we canestimate ut = Yt − α. Then we can ask whether our estimates of the errorsare uncorrelated and have constant variance.

We will go through this procedure three times, first for estimating the samplemean or expected value and testing hypotheses about it; then for the bivariateregression model, where the expected value is not a constant, but depends onanother variable, then for multiple regression, where it depends on many variables.

15.1.1. A warning

The procedures we are going to cover are called classical statistical inference andthe Neyman-Pearson approach to testing. When first encountered they may seemcounter-intuitive, complicated and dependent on a lot of conventions. But onceyou get used to them they are quite easy to use. The motivation for learning theseprocedures is that they provide the standard approach to dealing with quantitativeevidence in science and other areas of life, where they have been found useful.However, because they are counter-intuitive and complicated it is easy to makemistakes. It is claimed that quite a large proportion of scientific articles usingstatistics contain mistakes of calculation or interpretation.A common mistake is to confuse statistical significance with substantive im-

portance. Significance just measures whether a difference could have arisen bychance it does not measure whether the size of the difference is important.There is another approach to statistics based on Bayes Theorem. In many ways

Bayesian statistics is more intuitive, since it does not involve imagining lots of

95

Page 96: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

hypothetical samples as classical statistics does. It is conceptually more coherent,since it just involves using your new data to update your prior probabilities in theway we did in section 10.2.2. However, it is often mathematically more complex,usually involving integrals, though computers now make this integration easier.Gary Koop, Bayesian Econometrics, Wiley 2003 provides a good introduction.It is important to distinguish two different things that we are doing. First, in

theoretical statistics we are making mathematical deductions: e.g. proving thatan estimator has minimum variance in the class of linear unbiased estimators.Second, in applied statistics, we making inductions, drawing general conclusionsfrom a particular set of observations. Induction is fraught with philosophicaldiffi culties. Even if every swan we see is white, we are not entitled to claim ‘allswans are white’, we have not seen all swans. But seeing one black swan doesprove that the claim ‘all swans are white’is false. Given this, it is not surprisingthat there are heated methodological debates about the right way to do appliedstatistics and no ‘correct’rules. What is sensible depends on the purpose of theexercise. Kennedy, A Guide to Econometrics, chapter 21 discusses these issues.

15.2. Estimating the expected value.

Suppose we have an independent sample of data over time Y1, Y2, ..., YT ,

Yt = α + ut

where ut is a random variable with mean zero and variance σ2 and the observa-tions are uncorrelated or independent through time, i.e. E(ut) = 0, E(u2t ) = σ2,E(utut−i) = 0. Notice the number of observations here is T , earlier we used N orn for the number of observations. We wish to choose a procedure for estimatingthe unknown parameter α from this sample. We will call the estimator α (saidalpha hat). We get an estimate by putting in the values for a particular sampleinto the formula. We derive the estimator α in two ways: method of momentswhich matches the sample data to our population assumptions and least squareswhich mimimises the variance.

15.2.1. Method of moments 1

We assumed that E(Yt) = α, which implies E(ut) = E(Yt − α) = 0. Let uschoose an estimator α such that the sample equivalent of the expected value,(the mean) of (Yt − α) also equals zero. That is we replace E(Yt − α) = 0 with

96

Page 97: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

T−1∑T

t=1(Yt− α) = 0. This imples T−1∑T

t=1 Yt − T α

= 0 or T−1∑T

t=1 Yt = α.

So the estimator which makes the sample equivalent of E(Yt − α) = 0 hold is themean, so α = Y . Notice this derivation also implies that

∑Tt=1(Yt − α) = 0, the

sum of deviations from the mean are always zero

15.2.2. Least squares 1

Alternatively suppose, we choose the estimator, α that makes the sum of squareddeviations, S =

∑(Yt − α)2 as small as possible. This will also minimise the

estimated variance, σ2 =∑

(Yt − α)2/T.

S =T∑t=1

(Yt − α)2 =∑

(Y 2t + α2 − 2αYt) =

∑Y 2t + T α2 − 2α

∑Yt

To find the α that minimises this, we take the first derivative of S with respectto α and set it equal to zero:

∂S

∂α= 2T α− 2

∑Yt = 0.

Divide through by 2, move the −∑T

t=1 Yt to the other side of the equality,gives T α =

∑Yt or

α =T∑t=1

Yt/T.

so again α = Y .

15.3. Properties of the estimator

We distinguish, between the true (or population) parameter α and the estimatorα =

∑Yi/n, the formula telling you how to calculate an estimate from a partic-

ular sample. A different sample would give a different estimate, so α is a randomvariable. When different estimators are available, in this case the median might bean alternative estimator, we need criteria to choose between different estimators.One criterion is that the estimator is unbiased, on average (over lots of hypothet-ical samples) it is equal to the true value. The expected value of the estimator isequal to the true value of the parameter. Another property that is often desirableis that the estimates tends to be close to the true value; for unbiased estimatorsthis implies that the estimator has a small variance.

97

Page 98: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

15.3.1. The expected value of α

To find out whether the mean is unbiased we need to calculate the expected valueof α. This is

α =∑

Yt/T =∑

(α + ut)/T = Tα/T + (∑

ut/T )

E(α) = α + E(∑

ut/T ) = α

Since E(ut) = 0. So α is unbiased under our assumptions. From this derivationwe see that:

α− α =∑

ut/T

while on average over lots of hypothetical samples,∑ut/T may be zero, it will

not be zero in any particular sample, so our estimate will differ from the truevalue. Now let us calculate how large the difference is likely to be.

15.3.2. The variance and standard error of the mean α

The variance of α, say V (α) = E(α−E(α))2 = E(α− α)2 since E(α) = α. Sinceα− α =

∑ut/T

E(α− α)2 = E(∑

ut/T )2

The right hand side can be written

= E(u1T

+u2T

+ ...+uTT

)(u1T

+u2T

+ ...+uTT

)This product will have T 2 terms. There are T terms with squares like u21, andT 2 − T terms with cross-products like u1u2. The expectation of the squares areE(u2t )/T

2 = σ2/T 2, since the variance of the ut, E (u2t ) = σ2, is assumed constantfor all t. There are T terms like this, so the sum is T (σ2/T 2) = σ2/T. Theexpectations of the cross products are of the form E(utut−j)/T

2. But since theerrors are assumed independent E(utut−i) = 0, for i 6= 0, so the expectation ofall the cross-product terms equals zero. Thus we have derived the variance of themean, which is:

V (α) = E(α− E(α))2 =σ2

T(15.1)

where T is the number of observations.The square root of the variance σ/

√T is called the standard error of the

mean. It is used to provide an indication of how accurate our estimate is. Notice

98

Page 99: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

when we take the square root of the variance of a variable we call it a standarddeviation; when we take the square root of a variance of an estimator, we callit a standard error. They are both just square roots of variances.

15.3.3. Estimating the variance

There are two common estimators of the variance of Y :

σ2 =

∑(Yt − α)2

T

s2 =

∑(Yt − α)2

T − 1

the first estimator, σ2, sometimes called the population variance, which dividesby T is a biased estimator of σ2,

E(σ2) =T − 1

Tσ2 < σ2.

The proof is given below as background: you will not be required to prove itin the exam. The second estimator, s2, sometimes called the sample variance,is an unbiased estimator. The bias arises because we use an estimate of themean and the dispersion around the estimate is going to be smaller than thedispersion around the true value because the estimated mean is designed to makethe dispersion as small as possible. If we used the true value of α there wouldbe no bias. The correction T − 1 is called the degrees of freedom: the number ofobservations minus the number of parameters estimated, one in this case, α. Weestimate the standard error of the mean by

SE(α) =s√T

On the assumptions that we have made it can be shown that the mean is theminimum variance estimator of the expected value among all estimators whichare linear functions of the Yi and are unbiased. This is described as the meanbeing the Best (minimum variance) Linear Unbiased Estimator (BLUE) of theexpected value of Y. This is proved later in a more general context, but it is anatural result because we chose this estimator to minimise the variance.In many cases we are interested in what happens to the properties of the

estimator when T gets large: asymptotic properties. So although σ2 which divides

99

Page 100: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

by T is a biased estimator of σ2, as T →∞; (T−1)/T → 1 and the bias goes awayasymptotically. In addition as T →∞ the standard error of the mean σ/

√T → 0,

the distribution of estimates get closer and closer to the true value so with T =∞there is no dispersion at all, the estimator converges to its true value, we canestimate it exactly. Estimators which have this property are said to be consistent.Verbeek section 2.6 discusses asymptotic properties.

Background: estimating the variance Let xt = α + ut, with E(xt) = α;

V ar(xt) = E(xt−E(xt))2 = E(u2t ) = σ2; E(ukuh) = 0 for k 6= h and x =

T∑t=1

xt/T.

Consider

T−1T∑t=1

(xt − x)2 = T−1T∑t=1

(x2t + x2 − 2xxt) = T−1T∑t=1

x2t + x2 − T−12xT∑t=1

xt

= T−1T∑t=1

x2t − x2.

Consider the two terms separately

A : T−1T∑t=1

x2t = T−1T∑t=1

(α + ut)2 = α2 + T−1

T∑t=1

u2t + 2αT−1T∑t=1

ut;

B : x2 =

(T−1

T∑t=1

(α + ut)

)2=

(α + T−1

T∑t=1

ut

)2= α2 + T−2

(T∑t=1

ut

)2+ 2αT−1

T∑t=1

ut.

Then take expected values of each term using E(ut) = 0, and E(u2t ) = σ2

A : E(α2 + T−1T∑t=1

u2t + 2αT−1T∑t=1

ut) = α2 + T−1(Tσ2) = α2 + σ2

B : E

α2 + T−2

(T∑t=1

ut

)2+ 2αT−1

T∑t=1

ut

= α2 + T−2(Tσ2) = α2 + T−1σ2

Since E(∑T

t=1 ut

)2= Tσ2 because E(u2t ) = σ2, and E(ukuh) = 0.

100

Page 101: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

So taking the difference of the expected values of (A−B)

E(T−1T∑t=1

x2t − x2) =(α2 + σ2

)−(α2 + T−1σ2

)=T − 1

Tσ2.

15.3.4. Background Example: proportions

We also often use sample proportions to estimate probabilities. Barrow Chapter 4covers proportions. For instance in 10.2.2 we found from a sample of n =1000 firmsthat 450 reported doing neither product and process innovations, so we estimatedP (N) = p = 0.45. Had we sampled different firms we would have got a differentestimate and only if we had sampled all firms in the economy, the population,would we be sure that we got the true proportion of non-innovators. We want toknow how accurate our estimate is, what is its standard error? In the case of aproportion the standard error is

SE(p) =

√p(1− p)

n=

√0.45× 0.55

1000= 0.0157

So with a sample of 1000 our standard error on the estimated proportion of 45%is 1.57%. Often the formula is given using q = 1− p.In designing surveys it is important to check that: your sample is represen-

tative, random samples are best; not biased by non-response (innovators may bemore likely to fill out the form); the questions are clear, in this case firms maydiffer on what they think an innovation is; and the sample is large enough. Barrowchapter 9 discusses these issues. Suppose that you thought that a standard errorof 1.5% was too large and wanted to reduce it to 1%, how large would the surveyneed to be. Call our desired SE(p) x. Then we need

x =

√p(1− p)

nx2 = p(1− p)/nn = p(1− p)/x2

to get x = 1% we need n = (0.45×0.55)/(0.01)2 = 2475. You would need to morethan double the sample.

101

Page 102: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

15.4. Summary

So far we have (1) found out how to estimate the expected value of Y, E (Y ) = αby the mean; (2) shown that if the expected value of the errors is zero the meanis an unbiased estimator, (3) shown that if the errors also have constant varianceσ2 and are independent, the variance of the mean is σ2/T where T is the numberof observations (4) shown that the standard error of the mean can be estimatedby s/

√T , where s2 is the unbiased estimator of the variance and claimed (5) that

the mean had the minimum variance possible among linear unbiased estimators(the Gauss-Markov theorem) and (6) that for large T the distribution of α, willbe normal whatever the distribution of Y, (the central limit theorem).

16. Confidence intervals and Hypothesis Tests

Earlier we noted that if a variable was normally distributed, the mean plus orminus two standard deviations would be expected to cover about 95% of theobservations. This range, plus or minus two standard deviations is called a 95%confidence interval. In addition to constructing confidence intervals for a variablewe also construct confidence intervals for our estimate of the mean, where we usethe standard error of the mean instead of the standard deviation. Barow Chapter4 discusses these issues.

16.1. Example: was the New Economy different?

We often want to compare our estimate, α, with some hypothesised value α0.To illustrate the procedures, suppose the average real return on equities 1991-1999 was 18%, and the average 1872-1990 was 8%. Check the actual numbersfrom Applied Exercise I. At the end of the 1990s many people believed that therehad been a structural change and the arrival of the ‘New Economy’producedmuch higher expected returns than the historical average. In this case α0 = 8%,our hypothesis is that the true expected return during the 1990s is the long-run historical average. Our estimate is the mean during the 1990s, α = 18%,estimated on T = 9 observations. As our estimate of the standard deviation(square root of variance) of returns, we will use the estimate over the wholeperiod, σ = 18.4%, .assuming that the variance has not changed and that weknow it exactly. This simplifies the calculation a lot, if we used the estimatedvariance 1991-99 we would have a larger sampling error and have to use the tdistribution rather than the normal. Using the formula above, the standard error

102

Page 103: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

of our mean for the 1990s is thus

se(α) = σ/√T = 18.4/

√9 = 6.13.

Notice that we are using the mean calculated over 9 years, 1991-99, so we divide by√9, not by the number of years that we used to calculate the standard deviation.

If our estimate of the mean is normally distributed, with a true expected valueα0 = 8% we would expect there to be a 95% chance of the estimate falling intothe range

α± 1.96se(α) = 8± 1.96 (6.13) = 8± 12.

Thus we would be 95% confident that the range −4 to+20 would cover the truevalue. There is a 2.5% chance that a 9 year average would be above 20% and a2.5% chance it would be below -4%. The historical estimate falls in this rangeso even if the true expected return is equal to its historical value, 8%; we wouldexpect to see 9 year average returns of 18% just by chance, more than 5% of thetime. Suppose the true expected value is 8%, what is the probability of observing18% or more? We can form

z =α− αse(α)

=18− 8

6.13=

10

6.13= 1.63

Using the tables of the normal distribution we find P (Z ≥ 1.63) = 0.0516. Justover 5% of the time, we will observe periods of 9 years with mean returns of 18%or greater, if expected return is constant at 8%.This assumes everything is normally distributed, if the distribution had heavier

tails than a normal, these probabilities would be a little larger. We could alsocentre our confidence interval on α rather than α0 and report α± 1.96se(α). Thiswould be 18 ± 12 the range, 6 to 30. Notice this confidence interval covers thehistorical mean.We would conclude that, assuming normality, at the 95% level, the 1990s

return is not statistically significantly different from the historical mean.

16.2. Confidence intervals

Suppose Yt = α + ut. We can get an estimate, α, of α from our sample of Tobservations . If ut is normally distributed, α will also be normally distributed,because α is just a linear function of u. If the sample is large enough α will alsobe normally distributed, even if ut is not normal. We saw that

α = α +∑t

ut/T

103

Page 104: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Soα ∼ N(α, σ2/T )

s(α) = σ/√T is the standard error of α and can be estimated by, s(α) = s/

√T ,

where

s =

√∑(Yt − α)2

T − 1

From tables of the normal distributions, we know that it is 95% certain that αwill be within 1.96 standard errors of its true value α. The range α± 1.96s(α) iscalled the 95% confidence interval. The 68% confidence interval is α± s(α) : therange covered by the estimate plus and minus one standard error will cover thetrue value just over two thirds of the time. If that confidence interval covers somehypothesised value α0, then we might be confident that the true value could beα0. If α is more than about 2 standard errors from the hypothesised value, α0, wethink it unlikely that the difference could have occurred by chance (there is lessthan a 5% chance) and we say the difference is statistically significant. That iswe calculate the test statistic

τ =α− α0s(α)

and reject the null hypothesis that α = α0 at the 5% level if the absolute valueof the test statistic is greater than 1.96. We can also calculate the ‘p value’, theprobability of getting the observed value of τ if the hypothesis was true and rejectthe hypothesis if the p value is less than 0.05.

16.3. Small samples

Suppose that we have a sample of T observations. If we knew the true standarddeviations, the standard error would be s(α) = σ/

√T then (α − α0)/s(α) would

have a normal distribution. But when we estimate the variance by s2 our estimatedstandard error is s(α) = s/

√T . This adds extra uncertainty, from not knowing σ,

and (α−α0)/s(α) follows another distribution called the t distribution, introducedabove, which is more spread out. How much more spread out depends on thedegrees of freedom: T − 1. As the number of observations, T, becomes large theeffect of estimating the variance becomes smaller and the t distribution becomescloser to a normal distribution. For a normal distribution 95% of the distributionlies within the range ±1.96 standard errors. For a t distribution with 3 degreesof freedom (n = 4) 95% lies within ±3.182 standard errors, with 10 degrees

104

Page 105: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

of freedom it is ±2.228,with 30 degrees of freedom it is ±2.042. Tables of thet distribution are given at the back of statistics textbooks, e.g. Barrow TableA3. The practice in economics is to use 2 as the critical value and not use verysmall samples. Most computer programs give p values, the probability of the nullhypothesis being true, and they may be more convenient to use. At the 5% levelyou reject the null hypothesis if the p value is < 0.05. You have to know what thenull hypothesis is.In testing the new economy above we used the estimate of the standard de-

viation from the whole sample, so the number of observations used in estimatingthe variance was large and we could use the normal distribution. If we had usedthe estimate of the standard deviation from the 1990s the number of observationswould have been small, 9, and we would have had to use the t distribution.

16.4. Testing

In testing we start with what is called the null hypothesis, H0 : α = α0. It is callednull because in many cases our hypothesised value is zero, i.e. α0 = 0. We rejectit in favour of what is called the alternative hypothesis, H1 : α 6= α0; if there isvery strong evidence against the null hypothesis. This is a two sided alternative,we reject if our estimate is significantly bigger or smaller. We could also have onesided alternatives α < α0 or α > α0. The convention in economics is to use twosided alternatives.The problem is how do we decide whether to reject the null hypothesis. In

criminal trials, the null hypothesis is that the defendant is innocent. The jury canonly reject this null hypothesis if the evidence indicates guilt ‘beyond reasonabledoubt’. Even if you think the defendant is probably guilty (better than 50%chance) you have to acquit, this is not enough. In civil trials juries decide ‘onthe balance of the evidence’, there is no reason to favour one decision ratherthan another. So when OJ Simpson was tried for murder, a criminal charge,the jury decided that the evidence was not beyond reasonable doubt and he wasacquitted. But when the victims family brought a civil case against him, to claimcompensation for the death, the jury decided on the balance of the evidence thathe did it. This difference reflects the fact that losing a criminal case and losing acivil case have quite different consequences.Essentially the same issues are involved in hypothesis testing. We have a

null hypothesis, H0 : defendant is innocent. We have an alternative hypothesis,H1 : the defendant is guilty. We can never know which is true. There are two

105

Page 106: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

possible decisions. Either accept the null hypothesis (acquit the defendant) orreject the Null hypothesis (find the defendant guilty). In Scotland the jury hasa third possible verdict: not proven. Call the null hypothesis, H0, this could bedefendant innocent or α = α0. Then the possibilities are

H0 true H0 falseAccept H0 Correct Type II errorReject H0 Type I error Correct

In the criminal trial Type I error is convicting an innocent person. Type II erroris acquitting a guilty person. Of course, we can avoid Type I error completely:always accept the null hypothesis: acquit everybody. But we would make a lotof type II errors, letting villains go. Alternatively we could make type II errorszero, convict everybody. Since we do not know whether the null hypothesis is true(whether OJ is really innocent), we have to trade off the two risks. Accepting thenull hypothesis can only be tentative, this evidence may not reject it, but futureevidence may.Statistical tests design the test procedure so that there is a fixed risk of Type

I error: rejecting the null hypothesis when it is true. This probability is usuallyfixed at 5%, though this is just a convention.So the procedure in testing is1. Specify the null hypothesis, α = α0.2. Specify the alternative hypothesis α 6= α0.3. Design a test statistic, which is only a function of the observed data and

the null hyposthesis, not a function of unknown parameters

τ =α− α0s(α)

4. Find the distribution of the the test statistic if the null hypothesis is true.In this case the test statistic, τ , has a t distribution in small samples (less thanabout 30), a normal distribution in large samples.5. Use the distribution to specify the critical values, so that the probability of

α being outside the critical values is small, typically 5%.6. Reject the null if it is outside the critical values, (in this case outside the

range ±2); do not reject the null otherwise.7. Consider the power of the test. The power is the probability of rejecting

the null hypothesis when it is false (1 − P (typeIIerror)), which depends on thetrue value of the parameters.

106

Page 107: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

In the medical example, of screening for a disease, that we used in section 11,we also had two types of errors (false positives and false negatives), and we hadto balance the two types of error in a similar way. There we did it on the basisof costs and benefits. When the costs and benefits can be calculated that is thebest way to do it. In cases where the costs and benefits are not known we usesignificance tests.Statistical significance and substantive significance can be very different. An

effect may be very small of no importance, but statistically very significant, be-cause we have a very large sample and a small standard error. Alternatively,an effect may be large, but not statistically significant because we have a smallsample and it is imprecisely estimated. Statistical significance asks: ‘could thedifference have arisen by chance in a sample of this size?’ not ‘is the differenceimportant?’When we discussed confidence intervals we said that the 68% confidence inter-

val is α ± s(α) : the range covered by the estimate plus and minus one standarderror will cover the true value, α, just over two thirds of the time. There is astrong temptation to say that the probability that α lies within this range is twothirds. Strictly this is wrong, α is fixed not a random variable, so there are noprobabilities attached to α. The probabilities are attached to the random variableα, which differ in different samples. Bayesian statistics does treat the parame-ters as random variables, with some prior probability distribution; uses the datato update the probabilities; and does not use the Neyman-Pearson approach totesting set out above.

16.4.1. Example equities

Suppose the average real return on equities over T = 100 years was α = 10%;the standard deviation of real returns s = 20% and they appeared normallydistributed (in reality equity returns are not quite normally distributed). Fora Random Variable Z following a standard normal distribution

z 0.5 1 1.5 2 2.5 3P (Z < z) 0.6915 0.8413 0.9332 0.9772 0.9938 0.9987

(a) Explain what P (Z < z) means. What is P (Z < 0)?(b) What is the standard error of the mean?(c) Is the mean return significantly different from zero?(d) What is the probability of a return less than —50%?(e) What is the probability of a positive return.

107

Page 108: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Answer.(a) P (Z < z) is the probability that the random variable Z takes a value less

than a specified value, z. P (Z < 0) = 0.5 since the standard normal distributionis symmetric around zero, there is 50% below zero and 50% above.(b) Standard error of mean is s/

√T = 20/

√100 = 2.

(c) To test the hypothesis, we use the formula τ = (α−α0)/s(α), α = 10, α0 =0, s(α) = 2. : τ = (10 − 0)/2 = 5 is greater than 2. So the mean return issignificantly different from zero. We reject the null hypothesis that the expectedreturn is zero at (better than) the 5% level.(d) Probability of a return less than -50%? z=(—50-10)/20=-3. Distribution is

symmetrical so P (Z < −3) = P (Z > 3) = 1−P (Z < 3) : Prob=1-0.9987=0.0013or 0.13%(e) Probability of a positive return:z=(0-10)/20=-0.5; P(Z>-0.5)=P(Z <0.5)=0.6915 or 69%.Notice the importance of whether we are using the standard deviation of re-

turns σ or the standard error of the mean σ/√T .

16.4.2. Background Example: clinical trials

Clinical trials tend to be done in three phases. Phase I is a small trial to determinetoxicity and effective dosage. Phase II is a larger trial to determine effectiveness.Phase III is an even larger trial to compare effectiveness with alternative treat-ments, if any. If there is no alternative treatment, patients are randomly assignedto a treatment group who are given the drug and to a control group who are givena placebo, made to look as much like the drug as possible. The placebo effect is thefact that any treatment, however ineffective, tends to make patients get better, ifthey believe in it. Randomisation is important because otherwise the two groupsof patients may differ in ways that influence the effect of the drug. The trialsare double blind in that neither the patient not the physician knows whether thepatient is getting the drug or the placebo. This is to stop the physician selectingthose treated, e.g. giving it to the ones who were more ill, which would bias theresult of the trial. Giving some people an ineffective placebo raises ethical issues,but so does giving the others an untried and potentially dangerous drug. Againwe are trying to balance two sorts of errors.Suppose we have 100 patients, 50 in the treatment group, 50 in the control

group; 18 of the treatment group die within a time-period, 22 of the control groupdie; is this difference significant?

108

Page 109: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

AnswerAs we saw above, the standard error for an estimate of a proportion is se (p) =√pq/n where n is the number of observations on which it is based, and q =

1 − p. We estimate p = N/n, where N is the number who die. The number ofobservations in the treatment group n1 = 50, as is the number in the controlgroup, n2. The estimated proportions who die are p1 = 0.36, p2 = 0.44. If thenumber of observations n1 and n2 are suffi ciently large, the difference of the sampleproportions p1 and p2 will be approximately normal with mean p1−p2 and variance

V (p1 − p2) =p1q1n1

+p2q2n2

where qi = 1− pi. Our null hypothesis is p1− p2 = 0. If the null hypothesis is truethere is no difference, then our best estimate of p = p1 = p2 is (18+22)/100=0.4.and the standard error is

se(p) =

√0.4× 0.6

50+

0.4× 0.6

50≈ 0.1

our test statistic is then

τ =p1 − p2se(p)

=0.36− 0.44

0.1= −0.8

This is less than two in absolute value, so we would not reject the null hypothesisthat the proportion who died was the same in the treatment and control group.The differences could have easily arisen by chance. To check this we would needto do a larger trial. Barrow chapter 7 discusses these issues.It should not make a difference, but in practice people can be influenced by

how you frame the probabilities, e.g. in terms of proportion who die or proportionwho survive.

17. Bivariate Regression

A large part of the use of statistics in economics and finance (econometrics) in-volves measuring the effect of one variable (e.g. price) on another variable (e.g.quantity demanded). Regression is the statistical tool used to measure the effects.In this case price would be the independent variable or regressor and quantity de-manded the dependent variable. Barrow Chapters 7 and 8 discusses this material.

109

Page 110: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

17.1. Examples

17.1.1. CAPM.

Suppose the risk free interest rate over a period is Rt, the return on a particularstock is Ri

t and the return on the stock market (e.g. the FTSE or S&P index) wasRmt . These returns would usually be measured as the changes in the logarithms ofthe stock prices. The Capital Asset Pricing Model (CAPM) can be written as aregression (

Rit −Rt

)= α + β (Rm

t −Rt) + ut

the excess return on stock i is equal to a constant α (which should be zero) plusa coeffi cient β times the excess return on the market, plus a random error ordisturbance, which reflects the factors that shift the return on stock i other thanmovements of the whole market. The riskiness of a stock is measured by β, ifβ = 1 it is as volatile as the market; if β > 1, it is more volatile than the market;if β < 1, it is less volatile than the market. The riskier the stock, the higherthe return required relative to the market return. Given data on Ri

t, Rt and Rmt ,

for time periods t = 1, 2, ..., T we want to estimate α and β for the stock anddetermine how much of the variation of the stock’s returns can be explained byvariation in the market. Verbeek, section 2.7 discusses this example in more detail.

17.1.2. Fisher Relation

The real interest rate, It is the nominal interest rate Rt less the rate of inflationπt

It = Rt − πtsuppose the real interest rate is roughly constant, equal to a constant plus arandom error

It = I + ut

then we can writeRt = It + πt = I + πt + ut.

Then if we ran a regression

Rt = α + βπt + ut

the theory says α = I and β = 1, the hypothesis β = 1 can be tested. Theinterpretation of α is that it is the rate of interest that would be expected onaverage when the rate of inflation is zero. β tells you how much the interest raterises in response to a rise in inflation of a percentage point.

110

Page 111: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

17.2. Deriving the Least Squares Estimator

In both the examples above, there is data, t = 1, 2, ..., T, and a model of the form

Yt = α + βXt + ut,

and we will continue to assume that ut is a random variable with expected valuezero, constant variance σ2 and the observations are uncorrelated or independentthrough time, i.e. E(ut) = 0, E(u2t ) = σ2, E(utut−i) = 0. We will further assumethat the independent variable varies, V ar(Xt) 6= 0, and is independent of the errorso that the covariance between them is zero E(Xt − E(Xt))ut = 0. If we canestimate α and β, by α and β, then we can predict Yt for any particular value ofX:

Yt = α + βXt

these are called the fitted or predicted values of the dependent variable. We canalso estimate the error:

ut = Yt − Yt = Yt − (α + βXt)

these are called the residuals. Notice we distinguish between the true unobservederrors, ut, and the estimated residuals, ut.As with the expected value above there are two procedures that we will use to

derive the estimates, method of moments and least squares.

17.2.1. Method of Moments 2

Our two population assumptions (moment-conditions) are that the expected val-ues of the errors are zero, E(ut) = 0 and the covariance of the independent vari-ables and the errors are zero: E(Xt − E(Xt))ut = 0. We choose our estimatesso that the sample equivalents of these equations are true. The sample equivalentof E(ut) = 0 is that the mean of the estimated errors is zero

T−1∑t

ut = T−1

∑t

(Yt − α− βXt)

= 0

T−1

∑t

Yt − T α− β∑t

Xt

= T−1

∑t

Yt − α− β(T−1∑t

Xt) = 0

α = Y − βX

111

Page 112: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Our first moment-condition implies that the estimate of α is the mean of Yminus β times the mean of X. We do not yet know what β is, but as long as wedefine α this way, the errors will sum to zero, whatever the value of β. We cansubstitute this estimate of α into the estimated equation

Yt = α + βXt + ut =(Y − βX

)+ βXt + ut

Yt − Y = β(Xt −X) + ut

yt = βxt + ut. (17.1)

Where we use lower case letters to denote deviations from the mean, yt = Yt− Y ,xt = Xt −X., this is.In terms of deviations from the means the sample equivalent of our second

moment-condition: E(Xt − E(Xt))ut = 0 is

T−1∑t

xtut = 0

T−1∑t

xt(yt − βxt) = T−1∑t

xtyt − βT−1

∑t

x2t

= 0

β =

T−1

∑t

xtyt

/

T−1

∑t

x2t

=

∑t

xtyt

/

∑t

x2t

.

This says that our estimate of β is the ratio of the (population) covariance of Xt

and Yt to the variance of Xt, (remember lower case letters denote deviations fromthe means).

17.2.2. Least squares 2

As with the expected value we can also find the β that minimises∑u2t , where

from (17.1)∑t

u2t =∑t

(yt − βxt

)2=∑t

y2t + β2∑

t

x2t − 2β∑t

xtyt

the derivative of∑u2t with respect to β is

∂∑u2t

∂β= 2β

∑t

x2t − 2∑t

xtyt = 0 (17.2)

112

Page 113: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Writing this 2β∑

t x2t = 2

∑t xtyt and dividing both sides by 2

∑t x

2t ; gives β =∑

t xtyt/∑

t x2t , as before.

The second order condition is

(17.3)∂2∑u2t

∂β2 = 2

∑x2t > 0 (17.4)

since squares are positive, so this is a minimum.Our estimates

α = Y − βX

β =

∑t(Xt −X)(Yt − Y )∑

t(Xt −X)2=

∑t xtyt∑t x

2t

(i) make the sum of the estimated residuals zero and the estimated residualsuncorrelated with the explanatory variable and (ii) minimise the sum of squaredresiduals.

17.3. Properties of the estimates

Since

β =

∑xtyt∑x2t

=

∑xt(βxt + ut)∑

x2t= β +

∑xtut∑x2t

then E(β) = β, and it is unbiased; since because of independence

E

∑xtut∑x2t

= E

∑xt∑x2t

E(ut)

and E(ut) = 0. To derive the variance of β, note since β is unbiased

V (β) = E(β − β)2 = E(

∑xtut∑x2t

)2 =σ2∑x2t

(17.5)

using the same sort of argument as in deriving the standard error of the mean(15.1) above.

113

Page 114: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

17.4. Measuring how well the regression fits

Note if we define the covariance between X and Y : Sxy =∑

(Xt−X)(Yt−Y )/T ,and the variance Sxx =

∑(Xt −X)2/T , then β = Sxy/Sxx as we saw above and

the correlation coeffi cient is:

r =Sxy√SxxSyy

as we saw in section 6. The correlation coeffi cient lies between -1 (perfect negativerelation) and +1 (perfect positive relation) with 0 indicating no linear relationship.Notice that covariance and correlation only measure linear relationships. Youcould have an exact non-linear relationship (e.g. a circle) and the correlationwould be zero. In regression we use the square of the correlation coeffi cient, r2,usually written R2 and called the coeffi cient of determination. This gives you theproportion of the variation in Y that has been explained by the regression.We measure the dispersion around the line in exactly the same way that we

measured the dispersion around the mean, either using the biased estimator

σ2 =∑

u2t/T

or the unbiased estimators2 =

∑u2t/(T − 2)

where ut = Yt − α − βXt. Now there are T − 2 degrees of freedom because weestimated two parameters α and β. The estimate s2 is called the variance of theregression and its square root s, the Standard Error of The Regression (SER),which gives you an idea of the average size of the errors. The SER is measured inthe same units as the dependent variable. If the dependent variable is a logarithm,the SER can be multiplied by 100 and interpreted as a percentage error. We canthen measure the standard error of β as se(β) = s/

√∑x2t by putting the estimate

in (17.5) and taking square roots.The coeffi cient of determination, or R squared, the proportion of the variation

of Y that has been explained (one minus the proportion of the variation in Y thathas not been explained) is defined as

R2 = 1−∑u2t∑

(Yt − Y )2

Show this is the same as r2 defined above. If β = 0, then nothing has beenexplained: ut = Yt − α, where α is just the mean and R2 = 0.

114

Page 115: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Computer packages also often calculate adjusted R2 or R2(R bar squared).

This corrects the numerator and denominator for degrees of freedom:

R2

= 1−∑u2t/(T − k)∑

(Yt − Y )2/(T − 1)

where k is the number of explanatory variables, including the intercept. This canbe negative.

17.5. Assumptions for Least Squares to give good estimates

We are assuming:

• That we have the correct model for the process, e.g. that it is linear and wehave not left any relevant variables out.

• That the expected values of the errors is zero: E(ut) = 0, on average thetrue errors are zero. Notice that the average of our residuals is alwayszero, (T−1

∑ut = 0 by construction), as long as we have an intercept in

the equation (which will pick up the average of the error) or work withdeviations from the mean.

• That E(u2t ) = σ2, errors have constant variance. This assumption is some-times expressed the errors are homoskedastic (same variances), its failure isthat the errors are heteroskedastic (different variances).

• That E(utut−i) = 0, for i 6= 0. Errors are independent, with no serialcorrelation or no autocorrelation.

• That∑

(Xt − X)2 6= 0,the X’s must vary over the sample. We cannotcalculate β if this is not the case. This would also fail if there were notenough observations to estimate the parameters. We need T > 2. If T = 2,we can fit the observations exactly by a line through the two points.

• That E(Xt − E(Xt))ut = 0. This assumption is usually described as theX’s being exogenous: not related to the errors. Notice that for our estimatesT−1

∑t(Xt−X)ut = 0 by construction. For exogeneity, the X’s may either

be non stochastic, fixed numbers, (this is rare in economics where our Xvariables are usually random) or random variables distributed independentlyof the errors. Independence implies that X and the errors are not correlated,but is a stronger assumption than being uncorrelated.

115

Page 116: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Properties of estimators were discussed above. With all these assumptions wecan show that among all estimators of α and β that are linear functions of Yand are unbiased; the least squares estimator has the smallest variance. This isthe Gauss-Markov theorem: under these assumptions the least squares estimatoris the Best (minimum variance) Linear Unbiased Estimator, it is BLUE. If inaddition we add another assumption to the model, that the errors are normallydistributed, then our estimates will also be normally distributed and we can usethis to construct test statistics to test hypotheses about the regression coeffi cients.Even if the errors are not normally distributed, by the central limit theorem ourestimates will be normally distributed in large samples; in the same way that themean is normally distributed whatever the distribution of the variable in largesamples.We need the assumption thatX is exogenous, to make causal statements about

the effect of X on Y.When we are only interested in predicting Y , we do not needthe exogeneity assumption and have the result that the least squares predictionYt is the Best (minimum variance) Linear Unbiased Predictor of Yt.

17.6. Predicted values and residuals.

We assume that there is a true model or ‘data generating process’as its sometimescalled: Yt = α + βXt + ut. We estimate Yt = α + βXt + ut or Yt = Yt + ut. Thusthe least squares procedure splits Yt into two bits, the explained bit, the expectedor predicted value, and the unexplained bit, the residual, ut : the part of Yt leftover after we have explained all we can. The predicted value is an estimate of theconditional expectation for Y conditional on X: Yt = E(Yt | Xt) = α + βXt.Notice that the predicted values and the residuals are uncorrelated, their co-

variance is exactly zero:

T∑t=1

Ytut =∑

(α + βXt)ut = α∑

ut + β∑

Xtut

But our moment-conditions made∑ut = 0 and

∑Xtut = 0, so both terms are

zero. One of the main uses of the predicted values is in forecasting, we make anassumption about how X will change in the future, use the equation to forecastY and calculate a standard error for the forecast.By construction the residuals have mean zero (if there is an intercept, i.e. α, in

the equation, and you should always include an intercept) and they are uncorre-lated with the explanatory variables. But we can test whether other assumptions

116

Page 117: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

we made about the errors hold for the estimated residuals, see section 20.3. Inmany cases the best way to check the assumptions is to look at graphs of resid-uals (which should look random, with no obvious pattern in them). and of thehistogram (which should look roughly normal).

18. Multiple Regression

18.1. Example: logarithmic models

Most of the time we have more than one right hand side variable, so our regressionmay be a demand function like

lnQt = β1 + β2 ln It + β3 lnPt + β4 lnP ∗t + ut (18.1)

where Qt is quantity demanded, It real income and Pt the price of the good, P ∗t ameasure of the price of all other goods, and ln denotes natural logarithms. Giventhe log equation then β2 is the income elasticity of demand (the percentage changein demand in response to a one percent change in income), which we would expectto be positive, and β3 the own price elasticity, which we expect to be negativeand β4 is the cross-price elasticity, which for all other goods should be positive.We often use logarithms of economic variables since(a) prices and quantities are non-negative so the logs are defined(b) the coeffi cients can be interpreted as elasticities, % change in the dependent

variable in response to a 1% change in the independent variable, so the units ofmeasurement of the variables do not matter(c) in many cases errors are proportional to the variable, so the variance is

more likely to be constant in logs,(d) the logarithms of economic variables are often closer to being normally

distributed(e) the change in the logarithm is approximately equal to the growth rate and(f) lots of interesting hypotheses can be tested in logarithmic models. For

instance in this case they hypothesis H0 : β3 = −β4 (homogeneity of degree zero)means only relative prices matter.Notice the original model is non-linear

Qt = BIβ2t P

β3t P

∗β4t exp(ut)

where B = exp(β1), but can be made linear by taking logs.

117

Page 118: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Another common logarithmic model is the Cobb-Douglas production functionexplaining output at time t Qt, by capital Kt and labour Lt and an error

Qt = AKbtL

ctedt+ut

Notice output will be zero if either capital or labour are zero. We can make thislinear by taking logarithms

lnQt = lnA+ b lnKt + c lnLt + dt+ ut. (18.2)

The rate of growth of technical progress is measured by d, it is the amount logoutput changes between periods if all inputs are constant. The residual ut is oftentreated as a measure of effi ciency, how much higher or lower output is than youwould expect.If b + c = 1 there is constant returns to scale, CRS, if both inputs go up by

10%, output goes up by 10%. We can test this by rewriting (reparameterising)the equation as

lnQt − lnLt = lnA+ b [lnKt − lnLt] + (b+ c− 1) lnLt + dt+ ut (18.3)

and do a t test on the coeffi cient of lnLt, which should be not significantly dif-ferent from zero if there is CRS. Notice (18.2) and (18.3) are identical statisticalequations, e.g. the estimates of the residuals would be identical. The restrictedversion, which imposes CRS so reduces the number of parameters estimated byone is

lnQt − lnLt = lnA′ + b′ [lnKt − lnLt] + d′t+ u′t (18.4)

where the primes indicate that the estimates will be different. Another way totest CRS is to use an F test, discussed below, to compare the fit of (18.4) with(18.3) .

18.2. Matrix form of the Linear Regression Model

We could write (18.1)

Yt = β1 + β2X2t + β3X3t + β4X4t + ut

where the dependent variable Yt corresponds to lnQt. X1t = 1 all t so we do notneed to write it in. X2t corresponds to ln It, X3t to lnPt, X4t to lnP ∗t . Again we

118

Page 119: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

want to find the estimates of βi, i = 1, 2, 3, 4 that minimise the sum of squaredresiduals,

∑u2t .∑

u2t =∑

(yt − β1 − β2X2t − β3X3t − β4X4t)2

we have to multiply out the terms in the brackets and take the summation insideand derive the first order conditions, the derivatives with respect to the fourparameters. These say that the residuals should sum to zero and be uncorrelatedwith all Xit . The formulae, expressed as summations are complicated. It ismuch easier to express them in matrix form. Verbeek Appendix A reviews matrixalgebra.We can write this is vector form

Yt = β′Xt + ut

where β and Xi are 4× 1 vectors, so the product β′Xi is (1× 4)× (4× 1) = 1× 1a scalar, just like Yi.We can also write this in matrix form in terms of y a T × 1 vector and X a

T × 4 matrix Y1Y2.YT

=

1 X21 X31 X41

1 X22 X32 X42

. . . .1 X2T X3T X4T

β1β2β3β4

+

u1u2.uT

y

(T × 1)=

X(T × 4)

β(4× 1)

+u

(T × 1).

This gives us a set of T equations. Notice, in writing Xit, we have departed fromthe usual matrix algebra convention of having the subscripts go row column. Thisgeneralises to the case where X is a T × k matrix and β a k× 1 vector, whateverk. Notice that for matrix products, the inside numbers have to match for them tobe conformable and the dimension of the product is given by the outside numbers.

18.3. Assumptions

We now want to express our assumptions about the errors in matrix form. Theassumptions were: (a) that E(ut) = 0, on average the true errors are zero; (b)that E(u2t ) = σ2, errors have constant variance; and (c) E(utut−i) = 0, for i 6= 0,

119

Page 120: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

different errors are independent. The first is just that the expected value ofthe random T × 1 vector u is zero E(u) = 0. To capture the second and thirdassumptions, we need to specify the variance covariance matrix of the errors,E(uu′) a T × T matrix. u′ is the transpose of u, a 1 × T vector. The transposeoperation turns columns into rows and vice versa. Note u′u is a scalar, 1× 1 thesum of squared errors. Writing out E(uu′)and putting our assumptions in:

E(uu′) =

E(u21) E(u1u2) .. E(u1uT )E(u1u2) E(u22) .. E(u2uT )

.. .. .. ..E(u1uT ) E(u2uT ) .. E(u2T )

=

σ2 0 .. 00 σ2 .. 0.. .. .. ..0 0 .. σ2

So our assumptions say that E(uu′) = σ2IT .Where IT is a T × T identity matrixwith ones on the diagonal and zeros on the off diagonal.The assumption that X is exogenous, distributed independently of the errors,

u, impliesE(X ′u) = 0, which corresponds to our earlier assumptionE(Xt−X)ut =0.We also assume that X has full rank k. This implies that the different regressorsvary independently, are not perfectly correlated, and corresponds to our earlierassumption that Xt varies.

18.4. The bivariate case in matrix algebra

Consider our bivariate model, writing β1 for α, and β2 for β :

Yt = β1 + β2Xt + ut

for t = 1, 2, ..., T , that is a set of T equations. The errors, are ut = Yt−β1−β2Xt.Least squares chooses βi to minimise

∑u2t .

In matrix form the model is

y = Xβ + uY1Y2..YT

=

1 X1

1 X2

.. ..1 XT

[ β1β2]

+

u1u2..uT

and we will use X ′X where X ′ is the 2× T transpose of X, so X ′X is 2× 2

(X ′X) =

[T

∑Xt∑

Xt

∑X2t

](18.5)

120

Page 121: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

X ′X is a symmetric matrix so (X ′X)′ = (X ′X). We will also use the inverse ofX ′X

(X ′X)−1

=1

T∑X2t − (

∑Xt)

2

[ ∑X2t −

∑Xt

−∑Xt T

](18.6)

For the inverse to exist the matrix must be non-singular, with a non-zero determi-nant. The determinant of X ′X is T

∑X2t −(

∑Xt)

2 . Note∑X2t /T −(

∑Xt/T )2

is the variance of Xt.The sum of squared residuals is a scalar (a 1 × 1 matrix), u′u the product of

a (1× T ) vector u′ and a (T × 1) vector u. If A is a n ×m matrix, and B is anm× k matrix, the product AB is an n× k matrix. The transpose of the product(AB)′ is B′A′ the product of a k ×m matrix B′ with a m× n matrix A′. A′B′ isnot conformable, you cannot multiply a m× n matrix by a k ×m matrix. Belowwe set y′Xβ = β

′X ′y. In general a matrix is not equal to its transpose, but these

are both scalars, so are equal to their transposes. Expand the sum of squaredresiduals as above:

u′u = (y −Xβ)′(y −Xβ)

= y′y + β′X ′Xβ − 2β

′X ′y

T∑t=1

u2t =∑

Y 2t + (β

2

1T + β2

2

∑X2t + 2β1β2

∑Xt)

−2(β1∑

Xt + β2∑

XtYt)

The scalar β′X ′Xβ is a quadratic form, i.e. of the form x′Ax and the β

2

i appearin it. Quadratic forms play a big role in econometrics. A matrix, A, is positivedefinite if for any a, a′Aa > 0. Matrices with the structure X ′X are alwayspositive definite, since they can be written as a sum of squares. Define z = Xa,then z′z = a′X ′Xa is the sum of the squared elements of z.

18.5. Differentiation with vectors and matrices

Consider the linear relation:P = x

1xn

′ anx1

Then the differential of P with respect to x or x′ is defined as :

dP

dx= a and

dP

dx′= a′

121

Page 122: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

In the case n= 2, we can write:

P = [x1, x2]

[a1a2

]= x1a1 + x2a2

ThendP

dx1= a1 and

dP

dx2= a2

SodP

dx=

[ dPdx1dPdx2

]=

[a1a2

]= a

and

dP

dx′=

[dP

dx1,dP

dx2

]= [a1, a2] = a′

Consider the quadratic form:

Q = x1xn

′ A(nxn)(

xnx1)

Then the derivative of Q with respect to x or x′ is defined as :

dQ

dx= 2Ax and

dQ

dx′= 2x′A

In the case n= 2, we can write:

Q = [x1, x2]

[a11 a12a12 a22

] [x1x2

]where for simplicity A is assumed to be symmetric. Expanding this gives:

Q = [x1, x2]

[a11x1 + a12x2a12x1 + a22x2

]= a11x

21 + 2a12x1x2 + a22x

22

So:dQ

dx1= 2a11x1 + 2a12x2 and

dQ

dx2= 2a12x1 + 2a22x2

Then

dQ

dx=

[dQdx1dQdx2

]=

[2a11x1 + 2a12x22a12x1 + 2a22x2

]= 2

[a11 a12a12 a22

] [x1x2

]= 2 A

(2x2)(x2x1)

122

Page 123: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

and

dQ

dx′=

[dQ

dx1,dQ

dx2

]= [2a11x1 + 2a12x2, 2a12x1 + 2a22x2]

= 2 [x1, x2]

[a11 a12a12 a22

]= 2 x

1x2

′A2x2

18.6. Estimating β

As before we will consider two methods for deriving the estimators method ofmoments and least squares. This time for the model y = Xβ + u, where y and uare T × 1 vectors, β is a k × 1 vector and X a T × k matrix.

18.6.1. Method of moments 3

Our exogeneity assumption is E(X ′u) = 0, the sample equivalent is X ′u = 0, ak × 1 set of equations, which for the case k = 4 above, gives∑

ut = 0;∑X2tut = 0;∑X3tut = 0;∑X4tut = 0.

SoX ′u = X ′(y −Xβ) = X ′y −X ′Xβ = 0.

Since X is of rank k, (X ′X)−1 exists (X ′X is non-singular, its determinant isnon-zero) so

β = (X ′X)−1X ′y.

18.6.2. Least Squares 3

The sum of squared residuals is

u′u = (y −Xβ)′(y −Xβ)

= y′y + β′X ′Xβ − 2β

′X ′y

123

Page 124: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

To derive the least square estimator, we take derivatives, and set them equal tozero. If β is a k × 1 vector we get k derivatives, the first order conditions are thek × 1 set of equations,

∂u′u

∂β= 2X ′Xβ − 2X ′y = 0

Compare this to (17.2) . So the least squares estimator is β = (X ′X)−1X ′y asbefore. Again our assumptions ensures that (X ′X)−1 exists The second ordercondition is

∂2u′u

∂β∂β′ = 2X ′X

which is a positive definite matrix, ensuring a minimum. Compare this to (17.3) .Notice that X ′X and (X ′X)−1 are symmetric matrices equal to their trans-

poses, see (18.5) and (18.6) .

19. Properties of Least Squares

19.1. Expected Value

We can derive the expected value of β.

β = (X ′X)−1X ′y = (X ′X)−1X ′(Xβ + u)

β = (X ′X)−1X ′Xβ + (X ′X)−1X ′u

= β + (X ′X)−1X ′u (19.1)

E(β) = β + E((X ′X)−1X ′u)

since β is not a random variable, and ifX and u are independentE((X ′X)−1X ′u) =

E((X ′X)−1X ′)E(u) = 0 since E(u) = 0. Thus E(β) = β and β is an ubiased es-timator of β.

19.2. Variance Covariance matrix

From (19.1) we haveβ − β = (X ′X)−1X ′u

124

Page 125: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

The variance-covariance matrix of β is a k × k matrix

V (β) = E(β − E(β))(β − E(β))′ = E(β − β)(β − β)′

since β is unbiased. But from (19.1) we have

β − β = (X ′X)−1X ′u

so

E(β − β)(β − β)′ = E((X ′X)−1X ′u)((X ′X)−1X ′u)′

= E((X ′X)−1X ′u u′X(X ′X)−1

= (X ′X)−1X ′E(u u′)X(X ′X)−1

= σ2(X ′X)−1X ′X(X ′X)−1

= σ2(X ′X)−1

since E(uu′) = σ2I, σ2 is a scalar, and (X ′X)−1X ′X = I. Compare this to (17.5)above.We estimate V (β) by

V (β) = s2(X ′X)−1

where s2 = u′u/(T−k). The square roots of the ith diagonal element of s2(X ′X)−1

gives the standard errors of βi the ith elements of β, which is reported by computerprograms.

19.3. Normality

If we also assume normality then we can write that u ∼ N(0, σ2I). Linear func-tions of normally distributed variables are also normal, see (12.1). The matrixequivalent is that, for k × 1 vectors, Y and M and k × k matrix Σ (not the sum-mation sign) Y ∼ N (M,Σ) , then for h × 1 vectors X and A, and h × k matrixB

X = A+BY ∼ N (A+BM,BΣB′) (19.2)

Notice the variance covariance matrix of X say V (X) = BΣB′ is (h× k)× (k ×k)× (k × h) = h× h.Since y is a linear function of u it follows that y is also normally distributed:

y ∼ N(Xβ, σ2I). Since β = (X ′X)−1X ′y is a linear function of y, applying (19.2)

it is normally distributed β ∼ N(β, σ2(X ′X)−1). This says that β is normally dis-tributed with expected value β (and is therefore unbiased) and variance covariancematrix σ2(X ′X)−1.

125

Page 126: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

19.4. Predicted Values and residuals

The predicted values are y = Xβ; the residuals

u = y − y = y −Xβ = y −X(X ′X)−1X ′y = (I −X(X ′X)−1X ′)y = My;

whereM = I−Px, and Px = X(X ′X)−1X ′ where Px is called a projection matrix.Both M and Px are idempotent, equal to their product PxPx = Px and MPx = 0.So

u = My = M(Xβ + u) = MXβ +Mu = Mu,

since MXβ = (I − Px)Xβ = Xβ −X(X ′X)−1X ′Xβ = Xβ −Xβ = 0.

19.5. Gauss-Markov Theorem

We have often claimed that the mean or regression coeffi cients are the minimumvariance estimators in the class of linear unbiased estimators. We now prove it.Note this does not require assuming normality. Verbeek section 2.3 covers this.It applies to the mean when β contains a single element and X is just a columnof ones.Consider any other linear estimator β = Cy where we assume that X and C

are fixed (non-stochastic) matrices

β = Cy = C(Xβ + u) = CXβ + Cu

E(β) = CXβ + CE(u)

so β will be unbiased as long as CX = I. Write β = Cy = ((X ′X)−1X ′ + W )y,that is W = C − (X ′X)−1X. Then CX = I implies ((X ′X)−1X ′ + W )X = I or(X ′X)−1X ′X + WX = I or I + WX = I. This can only be true if WX = 0.

This also implies that X ′W ′ = 0. Assume this is the case to ensure that β isunbiased.The variance covariance matrix of β is:

E(β − (β))(β − (β))′ = E(β − β)(β − β)′

since β is unbiased by assumption. From above

β = β + Cu = β + ((X ′X)−1X ′ +W )u

β − β = (X ′X)−1X ′u+Wu

E(β − β)(β − β)′ = E((X ′X)−1X ′u+Wu)((X ′X)−1X ′u+Wu)′

126

Page 127: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

When we multiply out the brackets we have four terms:

E((X ′X)−1X ′uu′X(X ′X)−1) = σ2(X ′X)−1

E(Wu u′W ′) = σ2WW ′

E((X ′X)−1X ′uu′W ′) = σ2(X ′X)−1X ′W ′ = 0

E(Wuu′X(X ′X)−1) = σ2WX(X ′X)−1 = 0

The last two terms are zero since WX = X ′W ′ = 0. So the Variance of any otherlinear unbiased estimator is

V (β) = E(β − (β))(β − (β))′ = σ2[(X ′X)−1 +WW ′]

= V (β) + σ2WW ′

since WW ′ is a positive definite matrix for W 6= 0, we have shown that in theclass of linear unbiased estimators the OLS estimator has the smallest variance.An n × n matrix, A is positive definite if the scalar quadratic form b′Ab > 0 forany T × 1 vector b. In this case we requires b′WW ′b > 0. Define z = W ′b an n× 1vector then b′WW ′b = z′z =

∑z2t a sum of squares which must be positive.

19.6. Consequences of failure of the assumtions

E(X ′u) = 0. If this does not hold, because of failure of the exogeneity assumption,estimates are biased and inconsistent.

E(uu′) = σ2IT . If this does not hold, because of serial correlation or het-eroskedasticity of the errors, β remains unbiased, but is no longer minimumvariance and the standard errors are wrong: V (β) 6= s2(X ′X)−1. Robust stan-dard errors are available. Serial correlation or heteroskedasticity of the estimatedresiduals may be caused by misspecification rather than serial correlation or het-eroskedasticity of the errors.

(X ′X) non-singular, rank of X equals k. If this does not hold, because of exactlinear dependence in the data, there are no estimates β.

20. Tests in regression

20.1. Tests for a single hypothesis on individual coeffi cients

Suppose we have the model

Yt = α + βXt + γZt + ut

127

Page 128: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

we can test the significance of the individual coeffi cients using t ratios exactly aswe did for the mean

t(β = 0) =β

se(β)

where se(β)is the estimated standard error of β. This tests the null hypothesis

H0 : β = 0. If this t ratio is greater than two in absolute value we conclude thatβ is significant: significantly different from zero at the 5% level. Computers oftenprint out this t ratio automatically. They usually give the coeffi cient, the standarderror, the t ratio and the p value. The p value gives you (roughly) the probabilitythat the null hypothesis is true. If the p value is less than 0.05, we reject thehypothesis at the 5% level.We can test against other values. Suppose economic theory suggested that

γ = 1 the t statistic for testing this would be

t(γ = 1) =γ − 1

se (γ)

and if this t statistic is greater than two in absolute value we conclude that γ issignificantly different from unity at the 5% level.

20.2. Tests on joint hypotheses

Suppose that we wanted to test the hypothesis that none of the independentvariables had any effect on the dependent variable in

Yt = β1 + β2X2t + β3X3t + β4X4t + ut.

The hypothesis is H0 : β2 = β3 = β4 = 0. The test statistics used to test jointhypotheses follows a F distribution, introduced in section 12.2. Just as the tdistribution is described by its single degree of freedom, the F distribution isdescribed by its two degrees of freedom: the number of hypotheses being tested,three in this case, and the number of observations less the number of parametersestimated: T−4 in this case. This would be written F (3, T−4) and critical valuesare given in statistics books. The F statistic for this hypothesis (that all the slopecoeffi cients are equal to zero) is often printed out by computers, they also usuallygive a p value. With the p value you do not have to look up tables, just reject thenull hypothesis at the 5% level if p < 0.05. Notice that the joint hypothesis that

128

Page 129: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

all coeffi cients are equal to zero can give different conclusions from a sequence ofindividual hypotheses that each are equal to zero. These are different hypotheses.This F statistic can be related to R2 by

F (k − 1, T − k) =R2/(k − 1)

(1−R2)/T − k).

Notice the test above is for a very specific hypothesis, that the slopes are allzero. In general we would estimate an unrestricted regression with k coeffi cients(4 above) and calculate the unrestricted residual sum of squares URSS and arestricted regression, where the hypothesis of interest is imposed, with k − mcoeffi cients, where m is the number of restrictions and calculate the restrictedresidual sum of squares RRSS. Then under H0

(RRSS − URSS)/m

URSS/(T − k)˜ F (m,T − k).

If there is a single hypothesis, m = 1, the F and t tests are related by the factthat: F (1, T − k) = t2(T − k). This follows from the definitions in 12.3.Another common hypothesis tested is structural stability. In the unrestricted

model we allow the parameters to change and estimate the K × 1 vectors: β1over the period one t = 1, 2, ..., T1, with RSS1 = u′1u1; and β2 over period twot = T1 + 1, T1 + 2, ..., T with RSS2 = u′2u2 so URSS = RSS1 + RSS2.In therestricted model the parameters do not chnage and we estimate β0 over the wholeperiod t = 1, 2, ..., T with RRSS = u′0u0. Then the "Chow" test for parameterconstancy H0 : β1 = β2 conditional on the variances in the two periods beingequal is

(u′0u0 − (u′1u1 + u′1u1))/K

(u′1u1 + u′1u1) /(T − 2K)˜ F (K,T − 2K).

Note the restricted estimates K parameters β0, the unrestricted k = 2K : β1 andβ2, so m = K restrictions

20.3. Diagnostic Tests

If our assumptions about the errors are valid, the estimated residuals should benormally distributed and random: without any pattern in them, so our null hy-pothesis is that the model is well specified and there is no pattern in the residuals,e.g. other variables should not be able to explain them. Our alternative hypothe-sis is that the model is misspecified in a particular way, and since there are lots of

129

Page 130: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

ways that the model could be misspecified (the errrors could be serially correlated,heteroskedastic, non-normal or the model could be non-linear) there are lots of di-agnostic tests, each with the same null hypothesis, the model is well specified, butwith different alternatives: the specific form of misspecification. Just as doctorsuse lots of different diagnostic tests since there are many different ways a personcan be sick, there a many different ways a regression can be sick. Often the bestdiagnostic indicator to graph the residuals and look for patterns.

20.3.1. Tests for serial correlation and heteroskedasticity

The Durbin-Watson test for serial correlation is given by

DW =

∑Tt=2(ut − ut−1)2∑T

t=1 u2t

it should be around 2, say 1.5 to 2.5. Below 1.5 there is positive serial correlation,residuals are positively correlated with their previous (lagged) values, above 2.5negative serial correlation. It is only appropriate if (a) you are interested in firstorder serial correlation; (b) there is an intercept in the equation, so the residualssum to zero and (c) there is no lagged dependent variable in the equation. Firstorder (one lag) serial correlation assumes that errors are related to their values inthe previous period by an autoregressive, AR1, model

ut = ρut−1 + εt

but there may be higher order serial correlation. For instance,. in quarterly data,the errors may be related to errors up to a year ago: the size of the error in thealcohol equation at Christmas (Q4) is related not just to the previous quarterserror but to the size of the error last Christmas:

ut = ρ1ut−1 + ρ2ut−2 + ρ3ut−3 + ρ4ut−4 + εt

this is and AR4 model with fourth order (four lags) serial correlation. Supposeyou ran a regression

yt = β′xt + ut

The test involves running a regression of the residuals on the variables includedin the original regression and the lagged residuals

ut = b′xt + ρ1ut−1 + ρ2ut−2 + ρ3ut−3 + ρ4ut−4 + εt

130

Page 131: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

then testing the joint hypothesis H0 : ρ1 = ρ2 = ρ3 = ρ4. You reject the null,no serial correlation up to fourth order if the test statistic (Chi-squared or F) isabove the critical value for the required size, say 5%, or the p value is below 0.05.There are many diagnostic tests which involve regressing the estimated resid-

uals or powers of the residuals on particular variables. One test for heteroskedas-ticity regresses the squared residuals on the regressors:

u2t = a+ b′xt + vt

to check whether the variance of the residuals is correlatd with the regressors. Thenull hypothesis is b = 0. Another test for autoregressive conditional heteroskedas-ticity (ARCH) regresses the squared residuals on the lagged squared residuals;

u2t = a+ bu2t−1 + vt

and again tests b = 0.Technically, most of these tests are known as Lagrange Multiplier Tests. It

is important that you check your equation for various diseases before you regardit as healthy enough to be used. Statistical packages like EViews (section 22.2),Microfit, Stata, etc. have built in tests of the assumptions that are requiredfor Least Squares estimates to be reliable. If the assumptions do not hold theestimated standard errors are likely to be wrong and corrected standard errorsthat are ‘robust’to the failure of the assumptions are available.

20.4. Ommitted variables

If you have unhealthy residuals, that show serial correlation or heteroskedasticity,some text books tend to suggest that you model the disturbances, typically by aprocedure called Generalised Least Squares. However, in most cases the problemis not that the true disturbances are heteroskedastic or serially correlated. Theproblem is that you have got the wrong model, and the error in the way you spec-ified the model shows up in the estimated residuals. Modelling the disturbancesmay just treat the symptoms, it may be better to cure the disease: specify themodel correctly.Suppose the model is

yt = βxt + γzt + ut (20.1)

expressed in deviations from mean, to remove the constant. But we leave out ztand estimate:

yt = bxt + vt. (20.2)

131

Page 132: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

There are two estimates of the coeffi cient on xt : b and β. To understand therelation between thems we need to look at the relationship between xt and zt,summarised by the regression equation:

zt = dxt + wt (20.3)

wt is just the part of zt that is not correlated with xt. d may be zero, if there isno relationship. Put (20.3) into (20.1)and we get (20.2):

yt = βxt + γ(dxt + wt) + ut

yt = (β + γd)xt + (γwt + ut)

So b = (β + γd). When zt is left out the coeficient of xt picks up the part of ztthat is correlated with xt. Parts of zt that are not correlated with xt end up inthe error term (γwt + ut). This is why looking for patterns in the error term isimportant, it may suggest a variable you have left out. If you add a variable thatis not correlated with xt, the coeffi cient of xt will not change. If you add a variablethat is highly correlated with xt, the coeffi cient of xt will change a lot.

21. Economic and Financial Data III: Variables and Rela-tionships

Linear regression is very flexible because we can redefine the variables by sometransformation, which allows the underlying relationship to be non-linear, but therelationship we estimate to be linear, as in 18.1. We can also construct variables,like the trend in 18.1.

21.1. Dummy variables.

Suppose that we had UK annual data on consumption and income for 1930 to 1960and wanted to estimate a consumption function. This period includes the secondworld war, 1939-1945, when there was rationing and consumption was restricted.This would shift the consumption function and could be allowed for by estimatingan equation

Ct = α + βYt + γDt + ut

where Dt is a ‘dummy’variable which takes the value one 1939-45 and zero inother years. The intercept during the War is then α + γ and we would expect

132

Page 133: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

γ < 0. We could also write this

Ct = δ1Dt + δ2(1−Dt) + βYt + ut (21.1)

we would get an identical estimate β, in both equations, δ1 is the estimatedintercept 1939-45, δ2 = α is the intercept for other years and γ = δ1 − δ2, thedifference in the intercept between the two periods. Notice that had you includeda constant in (21.1) the computer would have refused to estimate it and told youthat the data matrix (X ′X) was singular. This is known as ‘the dummy variabletrap’.A similar technique allows for seasonal effects. Suppose that we had quarterly

data on consumption and income, and wanted to allow for consumption to differby quarters (e.g. spending more at Christmas). Define Q1t as a dummy variablethat is one in quarter one and zero otherwise; Q2t is one in quarter two zerootherwise, etc. Then estimate

Ct = α1Q1t + α2Q2t + α3Q3t + α4Q4t + βYt + ut

then the intercept in Q1 is α1 in Q2 is α2, etc. We could also drop one dummyand include a constant.

21.2. Non-linearities

21.2.1. Powers

We can easily allow for non-linearities by transformations of the data as we sawwith logarithms above. As another example imagine y (say earnings) first rosewith x (say age) then fell. We could model this by

yi = a+ bxi + cx2i + ui

where we would expect b > 0, c < 0. Although the relationship between y and xis non-linear, the model is linear in parameters, so ordinary least squares can beused, we just include another variable which is the square of the first. Notice thatthe effect of x on y is given by

∂y

∂x= b+ 2cxi

thus is different at different values of xi, and has a maximum (or minimum) whichcan be calculated as the value of x that makes the first derivative zero. On average

133

Page 134: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

earnings rise with age to a maximum then fall, while self reported happinesstends to fall with age to a minimum then rise: the middle-aged are wealthy butmiserable. We can extend this approach by including the product of two variablesas an additional regressor. There is an example of this in the specimen exam,section 3.2 question 5. Verbeek section 3.5 has an extensive discussion.

21.2.2. Regressions using Proportions

Suppose our dependent variable is a proportion, pt = Nt/K, where Nt is a numberaffected andK is the population, or a maximum number or saturation level. Thenpt lies between zero and one and the logistic transformation (ln(pt/(1− pt)) isoften used to ensure that the predicted value lies between zero and one. If theproportion is a function of time this gives,

ln

(pt

1− pt

)= a+ bt+ ut (21.2)

which is an S shaped curve for pt over time. This often gives a good description ofthe spread of a disease, e.g. the proportion of the population that have a mobilephone. Although this is a non linear relationship in the variable pt it is linear inparameters when transformed so can be estimated by least squares. The form ofthe non-linear relationship is

pt =Nt

K=

1

1 + exp−(a+ bt)(21.3)

We could estimate this directly, treating K as an unknown parameters in aprograms like EViews which does non-linear least squares. So if Nt is the numberof mobile phone owners we would enter this in Eviews as

N = C(1)/(1 + exp(C(2) + C(3) ∗@trend)). (21.4)

@trend in EViews provides a trend, t. C(1) would be an estimate of K, C(2)of a and C(3) of b. In practice, unless there is data very close to saturation itis diffi cult to estimate K precisely. Notice that (21.2) and (21.4) imply differentassumptions about how the error term enters (21.3) so are not equivalent.

22. Applied Exercise III: Running regressions

You can run regressions in Excel but in most cases it is easier to use a specialisedpackage. There are many of them and if you are familiar with a particular package

134

Page 135: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

use that. EViews is an easy package to use and is installed on our machines. gretlis a free open-source program similar to EViews. Microfit is another econometricspackage that is easy to use. EViews, Gretl and Microfit are menu driven programs,other programs like Stata are command driven (which ensures an audit trail ofwhat you have done). R in a popular open source statistics program. For yourprojects you can use any program you wish.This example uses the Shiller.xls file that was used in Applied Exercise I and

is on the ASE home page. It has data 1871-2000 (130 observations) on 5 vari-ables NSP (nominal stock prices), ND (nominal dividends) NE (nominal earnings,profits) and R (interest rates) and PPI (producer price index). Note that there isno 2000 data on three of the variables. Figures are given on NSP and PPI whichare January figures, the other three are averages for the year. Even if you arenot using EViews read the explanation and carry out the exercise on the softwareyou are using. The example below regresses dividends on earnings for the period1871-1986. First we describe some of the output the computer produces.

22.1. Regression Output

Computer programs will print out a range of information, which may include

• the estimates of the regression coeffi cients βi, i = 1, ..., k including the con-stant

• the standard error of each coeffi cient SE(βi) which measures how preciselyit is estimated,

• the t ratio t(βi = 0) = βi/SE(βi) which tests the null hypothesis that thatparticular coeffi cient is really zero (the variable should not appear in theregression). If the t ratio is greater than 2 in absolute value, we can rejectthe null hypothesis that βi = 0 at about the 5% level. In this case thecoeffi cient is said to be significantly different from zero or significant.

• the p value for the hypothesis that βi = 0. This gives the probability thatthe null hypothesis is true. If this is less than 0.05 again we can reject thehypothesis that βi = 0.

• The Sum of Squared residuals∑u2t . This is what least squares minimises.

135

Page 136: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

• The standard error of regression

s =√∑

u2t/(T − k)

where k is the number of regression coeffi cients estimated and T the numberof observations. This is an estimate of the square root of the error varianceσ2 and gives you an idea of the average size of the errors. If the dependentvariable is a logarithm, multiply s by 100 and interpret it as the averagepercent error.

• R squared, which tells you the proportion of the variation in the dependentvariable that the equation explains

R2 = 1−∑u2t∑

(Yt − Y )2=

∑(Yt − Y )2∑(Yt − Y )2

• R bar squared, which corrects R squared for degrees of freedom

R2

= 1−∑u2t/(T − k)∑

(Yt − Y )2/(T − 1)

where k is the number of regression coeffi cients estimated and T is the num-ber of observations. Whereas R2 is always positive and increases when youadd variables, R

2can be negative and only increases if the added variables

have t ratios greater than unity.

• Durbin Watson Statistic is a measure of serial correlation of the residuals

DW =

∑Tt=2(ut − ut−1)2∑T

t=1 u2t

it measures whether the residuals are correlated. It should be around 2, say1.5-2.5. It tests the hypothesis ρ = 0 in the autoregression ut = ρut−1 + εt.Roughly DW = 2(1−ρ). This statistic depends on the ordering of the data,since it calculates ut − ut−1. In time-series there is a natural ordering of thedata, in cross-section there is not. So in cross-section the DW should beinterpreted with caution.

• An F statistic which tests the hypothesis that none of the slope variables(i.e. the right hand side variables other than the constant α) is significant.Notice that in the case of a single slope variable, this will be the square ofits t statistic. Usually it also gives the probability of getting that value ofthe F-statistic if the slope variables all had no effect.

136

Page 137: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

22.2. Excel

Go into Excel, Load the Shiller.xls file. Click Tools; Data Analysis; Choose Re-gression from the list of techniques. You may have to add-in the data-analysismodule. Where it asks you Y range enter C2:C117. Where it asks you X rangeenter D2:D117. Click Output Range and enter in the output range box G1. Al-ternatively you can leave it at the default putting the results in a separate sheet.Click OK. It gives you in the first box, Multiple R, which you can ignore, Rsquared and Adjusted R Squared and Standard Error of the Regression. Then itgives you an ANOVA box which you can ignore. Then it gives you estimates ofthe coeffi cients (intercept, X Variable 1, etc), their standard errors, t statistics,and P values, etc. shown in the summary output.In this case we have run a regression of dividends on earnings and the results

for the sample 1871 1986 are:NDt = 0.169+ 0.456NEt +ut

(0.036) (0.007)[4.67] [61.65]0.000 0.000

R2 = 0.971, s = 0.31.Standard errors of coeffi cients are given in parentheses, t statistics in brackets,

and p values in braces. You would normally report only one of the three, usuallyjust standard errors. The interpretation is that if earnings go up by $10, thendividends will go up by $4.56. If earnings were zero, dividends would be 16.9cents. Earnings explain 97% of the variation in dividends over this period andthe average error in predicting dividends is 0.31. We would expect our predictionsto be within two standard errors of the true value 95% of the time. Both theintercept and the coeffi cient of earnings are significantly different from zero at the5% level: their t statistics are greater than 2 in absolute values and their p valuesare less than 0.05.In Excel, if you click the residuals box it will also give you the predicted

values and residuals for every observation. If you use Excel you must graph theseresiduals to judge how well the least squares assumptions hold. You can have moreright hand side, X, variables but they must be contiguous in the spreadsheet, sideby side. So for instance we could have estimated

NDt = α + βNEt + γNSPt + ut

by giving the X range as D2:E117.

137

Page 138: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

22.3. EViews

22.3.1. Entering Data

Open the EViews program, different versions may differ slightly. Click on File,New, Workfile, accept the default annual data and enter the length of the timeseries 1871 2000 in the box. OK. You will now get a box telling you that you havea file with two variables C (which takes the value unity for each observation) andRESID which is the variable where estimates of the residuals will be stored.Click on File, Import, Read Text-Lotus-Excel, then click on the Shiller file. It

will open a box. Tell it, in the relevant box, that there are 5 variables. Note theother options, but use the defaults, note that B2 is the right place to start readingthis data file. Click Read. You should now also have PPI, ND, NE and NSP Rin your workfile. Double click on NSP and you will see the data and have variousother options including graph.Highlight NE and ND. Click on Quick, then Graph, OK line graph, and you

will see the graphs of these two series. Close the graph. Always graph yourdata.Use Save As command to save the Workfile under a new name and keep saving

it when you add new data, transformations etc.

22.3.2. Estimating a Regression

Click on Quick, Estimate Equation and you will get a box. Enter ND CNE; set thesample as 1871 1986. OK and you will get a box with equation estimates. Noticethat you have menu buttons both on the equation box and the main window. Theestimates are the same as given above for Excel.Dependent Variable: NDMethod: Least SquaresDate: 07/29/04 Time: 14:00Sample: 1871 1986Included observations: 116Variable Coeffi cient Std. Error t-Statistic Prob.C 0.168391 0.035869 4.694672 0.0000NE 0.456537 0.007328 62.2974 1 0.0000R-squared 0.971464 Mean dependent var 1.445948Adjusted R-squared 0.971214 S.D. dependent var 1.868085S.E. of regression 0.316949 Akaike info criterion 0.556937

138

Page 139: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Sum squared resid 11.45203 Schwarz criterion 0.604413Log likelihood -30.30234 F-statistic 3880.968Durbin-Watson stat 0.659872 Prob(F-statistic) 0.000000

Suppose the equation is

Yt = α + βXt + ut, t = 1, 2, ..., T.

The program gives you the dependent variable Yt, in this case ND; the methodof estimation, in this case least squares; the date and time; the sample you used,in this case 1871 1986; and the number of observations, 116. Then for each righthand side variable, it gives you the estimate of the coeffi cient, α and β; its standarderror, its t statistic and p value (Prob).Reading down the first column it gives: R-squared; Adjusted R-squared, usu-

ally called R bar squared; the Standard Error of the Regression, which measuresthe average size of the error in predicting Yt; Sum of Squared Residuals

∑Tt=1 u

2t ;

Ignore Log-likelihood; Durbin Watson Statistic In this regression, the DW of 0.662shows that there is something badly wrong. The second column gives the meanand standard deviation of the dependent variable, two criteria which you can ig-nore for now and the F-statistic and p value for the hypothesis that all of theslope coeffi cients are zero.In Eviews when you have the equation box on the screen, click View on the box

toolbar and you will see a range of options. Actual Fitted Residuals allows you tograph the actual and fitted (predicted values) for the dependent variable and theresiduals. Always look at the graph of predicted values and residuals.Under Residual you can test for normality, serial correlation, heteroskedasticityand under stability you can test for non-linearity RESET or structural changein the parameters at some point. Use Chow Break point if you know when therelationship shifted, or Cusum graphs if you do not. If the graphs go outside theconfidence bands there is a problem. In each case the null hypothesis is that themodel is well specified (does not have the problem) so small p values (p<0.05)lead you to reject the hypothesis that this is a healthy equation. Verbeek section3.3 and 4.4 and 4.7 discusses many of these tests.

22.3.3. A different specification.

Close the equation box. Click on Quick, Generate, enter in the box the equation

LRD = log(ND/PPI)

139

Page 140: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

and click OK. You will see that a new variable, log of real dividends, has beenadded to the workfile.Do the same to generate log of real earnings: LRE=log(NE/PPI). Graph LRD

and LRE. Click on Quick, Estimate equation, and enter LRD C LRE LRD(-1) inthe box. This estimates an equation of the form

Yt = α + βXt + γYt−1 + ut

where the dependent variable is influenced by its value in the previous period.The estimates are:

LRDt = − 0.517+ 0.248LREt+ 0.657LRDt−1 R2 = 0.94(0.107) (0.034) (0.046) s = 0.108

Although this has a R2 of 0.94, it does not mean that it is worse than theprevious equation, which had an R2 of 0.97, because the two equations have dif-ferent dependent variables. Above the dependent variable was nominal dividends,here it is log real dividends. The Durbin Watson statistic for this equation is 1.72which is much better, though it is not strictly appropriate in a dynamic equationwhich includes the lagged dependent variable. The long-run elasticity of dividendsto earnings is 0.248/(1 − 0.657) = 0.72. A 1% increase in earnings is associatedwith a 0.72% increase in dividends in the long-run.

23. Dynamics

With cross-section data a major issue tends to be getting the functional formcorrect; with time-series data a major issues tends to be getting the dependenceover time, the dynamics, correct.

23.1. Autoregressions

We have already come across autoregressions (AR), regressions of a variable onlagged values of itself when discussing serial correlation in error terms. Theycan also used for variables just as above we ran a regression of dividends on aconstant, earnings and the lagged value of dividends. A first order autoregressionwould take the form

yt = α0 + α1yt−1 + ut

the parameters can be estimated by least squares. The random walk with drift,our first example of a model in section 5.2, is the special case where α1 = 1. If

140

Page 141: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

−1 < α1 < 1 the process is stable, it will converge back to a long-run equilibriumafter shocks. The long run equilibrium can be got from assuming yt = yt−1 = y(as would be true in equilibrium with no shocks) so

y = α0 + α1y

y∗ = α0/(1− α1).

Using the star to indicate the long-run equilibrium value. A random walk doesnot have a long-run equilibrium it can wander anywhere.A second order (two lags) autoregression takes the form

yt = α0 + α1yt−1 + α2yt−2 + ut.

This is stable if −1 < α1 + α2 < 1, in which case its long run expected value is

y∗ =α0

1− α1 − α2

We may also get slow responses from the effects of the independent variables,these are called distributed lags (DL). A first order distributed lag takes the form

yt = α + β0xt + β1xt−1 + ut

and we could have higher order versions.We can put the first order AR1 and DL1 together to get an ARDL(1,1)

yt = α0 + α1yt−1 + β0xt + β1xt−1 + ut (23.1)

Again it can be estimated by least squares. There are often strong theoreticalreasons for such forms. We can reparameterise this to an error correction model,ECM form,

yt − yt−1 = α0 + (α1 − 1)yt−1 + β0xt − β0xt−1 + β0xt−1 + β1xt−1 + ut

∆yt = α0 + (α1 − 1)yt−1 + β0∆xt + (β0 + β1)xt−1 + ut

∆yt = a0 + a1yt−1 + b0∆xt + b1xt−1 + ut (23.2)

These dynamic forms can be interpreted in terms of a long-run equilibriumrelationship,

y∗t = θ0 + θ1xt (23.3)

141

Page 142: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

and slow adjustment. The simplest form of adjustment is the partial adjustmentmodel (PAM)

∆yt = λ(y∗t − yt−1) + ut. (23.4)

∆yt = yt− yt−1, is the change. For instance, people change their consumption, yt,to remove part of the difference between the equilibrium consumption and con-sumption in the previous period. The coeffi cient λ is an adjustment coeffi cient, itmeasures the proportion of the deviation from the equilibrium level of consump-tion as a function of income, xt, made up in a period. We would expect 0 < λ ≤ 1,with λ = 1 indicating instantaneous adjustment and λ = 0 no adjustment. Wecan write this

∆yt = λθ0 + λθ1xt − λyt−1 + ut

oryt = λθ0 + λθ1xt + (1− λ)yt−1 + ut

we would just run a regression of yt on a constant, xt and lagged yt, i.e. yt−1consumption,

yt = α0 + βxt + α1yt−1 + ut

We can recover the theoretical parameters λ, θ0, θ1 from the estimated parametersgiven by the computer α0, α1, β. So we estimate the speed of adjustment asλ = (1− α1); and the long run effect as θ1 = β/λ = β/(1− α1). This is a dynamicequation it includes lagged values of the dependent variable. Whether we estimateit using the first difference ∆yt or level yt as the dependent variable does notmatter, we would get identical estimates of the intercept and the coeffi cient ofincome, xt. The coeffi cient of lagged consumption in the levels equation will beexactly equal to the coeffi cient in the first difference equation plus one. Sums ofsquared residuals will be identical, though R2 will not be, because the dependentvariable is different. This is one reason R2 is not a good measure of fit.For more complex adjustment processes, we can keep the long-run relation-

ship given by (23.3) and replace the partial adjutment model (23.4) by the errorcorrection model (ECM), which assumes that people respond to both the changein the target and the lagged error

∆yt = λ1∆y∗t + λ2(y

∗t−1 − yt−1) + ut

∆yt = λ1θ1∆xt + λ2(θ0 + θ1xt−1 − yt−1) + ut

∆yt = a0 + b0∆xt + b1xt−1 + a1yt−1 + ut.

This is the same as (23.2)above and the estimated parameters are functions of thetheoretical parameters, (a0 = λ2θ0, b0 = λ1θ1, b1 = λ2θ1, a1 = −λ2), so we can

142

Page 143: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

solve for the theoretical parameters from our estimates, e.g. the long run effectis, θ1 = −b1/a1.We can find the equilibrium solution for (23.1) if the model is stable, −1 <

a1 < 1 by setting yt = yt−1 = y; xt = xt−1 = x so that in long-run equilibrium

y = α0 + α1y + β0x+ β1x

y =α0

1− α1+β0 + β11− α1

x

y∗ = θ0 + θ1x

Economic theory usually makes predictions about the long-run relations ratherthan the short-run relations. Notice our estimate of θ1 will be identical whetherwe get it from the ECM (23.2) or ARDL (23.1) equation or from estimating anon-linear version.A common restriction on dynamic models in logarithms is that the long-run

coeffi cient is unity. This can be tested by estimating

∆yt = a0 + b0∆xt + b1(xt−1 − yt−1) + (a1 + b1)yt−1 + ut.

If θ1 = −b1/a1 = 1; then (a1 + b1) = 0.

23.2. ARCH

Asset prices tend to show volatility clustering, periods of high volatility followedby periods of low volatility. This is often captured by assuming that the varianceof the asset price is positively serially correlated, so a high variance in one periodmakes it more likely that there will be a high variance in the next period. Supposethe logarithm of the asset price is a random walk

pt = pt−1 + εt

∆pt = εt

Usually we assume that E(εt) = 0 and E(ε2t ) = σ2, it has constant variance,is homoskedastic. Here we shall assume that the variance changes through timeso E(ε2t ) = σ2t , it is heteroskedastic, and that the variance follows a first orderautoregression:

σ2t = α + ρσ2t−1 + vt.

This is Auto-Regressive Conditional Heteroskedasticity, ARCH. If we can esti-mate this equation, we can use it to predict the variance in the future. This is

143

Page 144: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

straightforward, our best estimate of σ2t = ε2t and since ε2t = (∆pt)

2 we can justrun a regression of

(∆pt)2 = α + ρ(∆pt−1)

2 + vt.

The unconditional variance of returns is α/(1− ρ) assuming the process is stable−1 < ρ < 1.Above we could estimate the variance directly, usually we assume that the

error from an estimated regression equation exhibits ARCH. Suppose that weestimate

yt = β′xt + εt

where E(εt) = 0 and E(ε2t ) = σ2t . The GARCH(1,1) first order Generalised ARCHmodel is then

σ2t = a0 + a1ε2t−i + b2σ

2t−i

more lags could be added. Eviews can estimate GARCH models of various forms.

23.3. Final thought

Modern statistics and econometrics programs are very powerful, you can easilyestimate almost anything you want. The diffi cult task is interpreting what theprogram output is telling you. You have a better chance of interpreting the outputif you have: (a) graphed the data and know what it really measures; (b) inves-tigated the historical or institutional context of the data; (c) thought about theeconomics or other theoretical context; (d) understood the statistical techniquebeing used and (e) know the purpose of doing the statistical analysis. Try to letthe data speak for themselves rather than beating a confession out of them.

144

Page 145: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

24. Index

AR (autoregressive) 130, 140ARCH 133-4ARDL (autoregressive distributed lag) 141Asymptotic 99Average (arithmetic mean) 40, 45Bayes theorem/Bayesian statistics 73-8, 95CAPM (capital asset pricing model) 109Central limit theorem 83, 87, 93, 101, 116Chow test 129Cobb Douglas production function 117Confidence intervals 102-4Consistent 99Correlation coeffi cient 47-8, 57, 113Covariance 47-8, 81, 111-4,124Demand function 117Dummy variables 132Durbin Watson statistic 130, 134Dynamics 140ECM (error correction model) 141effi cient market model 42Exchange rates 90-91F distribution & tests 88, 128GARCH 143-4GDP/GNP, Gross Domestic/National Product 50, 52, 64,Geometric mean 45, 68,Growth rates 52, 54, 88-9Hypothesis tests 102-106Index numbers 61-70Inflation rates 52-55, 61Interest rates 50, 53, 62, 88-90Kurtosis 48, 56, 83,Law of large numbers 93Least squares 96-7, 111-2, 120, 123Logarithms 43, 63, 84, 89, 110, 117-8,Logistic 134

145

Page 146: 2014-2015 Applied Statistics and Econometrics Notes …€¦ · Applied Statistics and Econometrics Notes and ... These topics roughly correspond to a lecture each, ... Basic Econometrics,

Mean see average and geometric meanMedian 40Mode 40Moments 48, Method of moments 94, 96, 111, 123Monte Carlo 93Neyman-Pearson 95Normal distribution 83-6Partial Adjustment 142Predicted values 116, 134Probability 70-Production function, 118Proportions 101, 109, 134Random variables 79Regression, section 17 onwardsResiduals 111, 113, 116, 121, 126R squared and R bar squared 136Seasonality 53, 132Significance 104Skewness 48, 56, 83Standard Deviation 46, 81, 83Standard Error 94-5 of mean 98-102 of regression coeffi cients 113-4, of regres-

sion 114Standardised data 47, standard normal 84t distribution and test 87, 104Time trend 118Unemployment 52-57Variance 46-8, 81-83, 99Variance-covariance matrix of u 120, of β 124

146