# econometrics notes (introduction, simple linear regression, multiple linear regression)

Post on 05-Dec-2014

1.281 views

Embed Size (px)

DESCRIPTION

Econometrics notes for BS economics students Muhammad Ali Lecturer in Statistics Higher Education Department, KPK, Pakistan. Email:[email protected]TRANSCRIPT

- 1. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 1 Introduction Definition of Econometrics literally interpreted econometrics means "economic measurement". Econometrics may be defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena(variable).Econometrics can also be defined as "statistical observation of theoretically formed concepts OR alternatively as mathematical economics working with measured data. so economic theory attempts to defined the relationship among different economic variables. Methodology of Econometrics Following are the main steps in methodology of econometrics 1. Specify mathematical equation to describe the relationship between economic variables. 2. Design methods and procedures based on statistical theory to obtain representative sample from the real world. 3. Development of methods for estimating the parameters of the specified relationships 4. Development of methods of making economic forecast for policy implications based on estimated parameters.
- 2. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 2 What are the goals of econometrics Econometrics help us to achieves the following three goals: 1. Judge the validity of the economic theories. 2. Supply the numerical estimates of the co-efficient of the economic relationships which may be then used for some sound economic policies. 3. Forecast the future values of the economic magnitude with certain degree of probability. The Nature of Econometrics Approach The first step of every econometrics research is the specification of the model, a model is simply a set of mathematical equations. If the model has only one equation it is called a single-equation model. Whereas if it has more than one equation it is called a multi-equation model. Now let us consider the following model. Y=0 + 1X where Y= Consumption expenditure 0 = Intercept 1= Slope or co-efficient of regression X= income This is a deterministic model showing the relationship between consumption and income. The non-deterministic or stochastic models can be written as: Yi=0 + 1Xi+ i
- 3. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 3 where " i " is known as the disturbance or residual term, and hence it is a random variable. It is also called probabilistic model. The disturbance or error term represent all those factors that affect consumption but not taken into account. This equation is an example of an "Econometric Model". In other words it is an example of a "linear regression Model". In this case the response variable 'Y' is linearly related to the predictor variable 'X' but the relationship between the two is not exact. The second step is the estimation of model by appropriate econometric method, this will include the following steps. 1. Collection of Data for the variables included in the model. 2. Choice of appropriate econometric technique for the estimation of technique used is Regression Analysis in Statistics). 3. The third step is to develop the suitable criteria to find out whether estimates obtained are in agreement with the expectations of the theory that has been tested, that is to decide whether the estimates of the parameters of the theoretically meaning full and statistically significant. 4. The final step is to use the estimated model to predict the future value of the response variable. Deterministic and Stochastic Models A relation between X and Y is said to be deterministic if for each value of predictor variable X, there is one and only one corresponding values of response variable Y. On the other hand the relation between X and Y is said to be stochastic or probabilistic if for a predictor value of X 4 there is a whole probability distribution of values of Y, that is 'Y' is a random variable and 'X' is a fixed mathematical variable measured without error.
- 4. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 4 Types of Econometrics Econometrics can be divided into two main braches 1. Theoretical 2. Applied 1. Theoretical econometrics: It is concerned with development of the appropriate methods for measuring the economic relationship specified by the econometrics models. This type of econometrics depends much on mathematical statistics, single equation and simultaneous equations techniques, the methods used for measuring economic relationships. eoussimulabxbxaY eousSimulbxbxaY simplebxaY tan tan 21 20 ++= ++= += 2. Applied econometrics: Applied econometrics Describe the practical value of economic research. It deals with the applications of econometric methods developed in the theoretical econometrics to the different fields of economics such as the consumption functions, demand and supply, fraction etc. The applied econometrics has made it possible to obtained numerical results from these studies which are of great importance to the planners.
- 5. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 5 The Role of Econometrics: The important role of econometric is the estimation and testing of economics models. The first step in the process is the specification of the model in the mathematical form. The 2nd step is the to convert the relevant data from the economy. The thirdly use the data to estimate the parameters of the models and finally we will carry out tests on the estimated model in an attempt to judge whether it constitutes a sufficiently realistic picture of the economy being studied or whether somewhat different specification is to be estimated.
- 6. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 6 Regression Analysis Definition of Regression Regression analysis is statistical technique for investigating and modeling the relationship between variables. The term regression was first time introduced by "Francis Galton". In his paper Galton found that the height of the children of unusually tall or unusually short parents tends to move towards the average height of the population. Galton law of universal regression was confirmed by his friend Karl Pearson, more than a thousand records of heights of members of family groups. He found that the average height of sons of a group of tall fathers was less than their father height and the average height of sons of a group of tall fathers was less than their fathers height and the average height of sons of a group of short fathers was greater than their fathers height, thus "regressing tall and short sons alike toward the average height of all men. In the world of Galton this was "regression to mediocrity". Modern Interpretation of Regression Regression analysis is the study of the dependence of one variable the dependent variable, on one or more other variables, the explanatory variable. Objective of Regression The objective of regression analysis is to estimate or predict the average value of the response variable on the basis of the known or fixed values of the predictor variable.
- 7. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 7 The Simple Linear Regression Model Simple linear regression is the most commonly used technique for determining how one variable of interest(the response variable)is affected by changes in another variable(the explanatory variable)The terms "response" and "explanatory" mean the same thing as "dependent" and "independent", but the former terminology is preferred because the "independent" variable may actually be interdependent with many other variables as well. Simple linear regression is used for three main purposes: 1. To describe the linear dependence of one variable on another. 2. To predict values of one variable from values of another, for which more data are available. 3. To correct for the linear dependence of one variable on another, in order to clarify other features of its variability. Linear regression determines the best-fit line through a scatter plot of data, such that the sum of squared residuals is minimized; equivalently, it minimizes the error variance. The fit is "best" in precisely that sense: the sum of squared errors is as small as possible. That is why it is also termed "Ordinary Least Squares" regression. Model of the simple linear regression is given by: Yi=0 + 1Xi+ i Where 0 = Intercept 1= Slope or co-efficient of regression i =random error
- 8. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 8 An important objective of regression analysis is to estimate the unknown parameters 0 and 1 in the regression model. This process is also called fitting the model to the data, The parameters 0 and 1 are usually called regression coefficients. The slope 1 is the change in the mean of the distribution of Y producing a unit change in X. The intercept 0 is the mean of the distribution of the response variable Y when X=0. Estimation of the parameters by OLS(ordinary least squares) Ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared error or residual. Mathematically the sum of square of error can be w = == n i iii Ixyes 1 2^^ 2 Differentiating equation ( I ) with respect to and 02 02 ^^ ^ ^^ ^ = = = = iii ii xxy s xy s simplifying the above equations we get the following normal equations: iiiXXXY iiXnY iiii ii += += 2 ^^ ^^ These are called normal equations. From equation (ii) we get:
- 9. Muhammad Ali Lecturer in Statistics GPGC Mardan. M.sc (Peshawar University) Mphil(AIOU Islamabad) Now substituting the value of Y Y Y ^ ^ = = It is important to know that and sample rather than the entire population. If you took for and . Let's call and econometrics is to analyze the quality of these estimators and see under what conditions these are good estimators and under which conditions they are not. more variables. The The second is the estimates of the error terms, which we will call the Lecturer in Statistics GPGC Mardan. 9 XY ^^ = Now substituting the value of in equation ( ii ) we get: ( ) ( ) nXX nXYYX XXX XYYX XXXXYXY XXXXYXY XXXYXY ii iiii ii iii iiiii iiiii iiii / / )( 22 2 2 ^ 2 ^^ 2 ^^ = = = += += and are not the same as and because they are based on a single sample rather than the entire population. If you took a different sample, you would get different values and the OLS estimators of and . One of the main goals of econometrics is to analyze the quality of these estimators and see under what conditions these are good d under which conditions they are not. Once we have and ore variables. The first is the fitted values, or estimates of The second is the estimates of the error terms, which we will call the Econometrics BS Economics because they are based on a single a different sample, you would get different values . One of the main goals of econometrics is to analyze the quality of these estimators and see under what conditions these are good , we can construct two fitted values, or estimates of y: The second is the estimates of the error terms, which we will call the residuals:
- 10. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 10 Assumptions of the Classical linear Regression Model: Following are the few important assumptions of the classical linear regression model: 1. Linearity: The regression model is linear in the parameters. i.e. Yi =0 +1Xi + ui 2. Non stochastic X: Values of the independent or repressor variable assumed to be fixed in the repeated sampling. 3. Zero mean of the error term: The expected value or mean of the random disturbance term given the value of X is zero. i.e. E ( Ui/Xi) = 0 4. Homoscedasticity: Variance of the ui for all the observations remain the same.i.e. V ( u1) = 2 V(u2) = 2 V(u3) = 2 V(un)=2
- 11. Muhammad Ali Lecturer in Statistics GPGC Mardan. M.sc (Peshawar University) Mphil(AIOU Islamabad) 5. No Autocorrelation between error term error terms, symbolically: 6. No relationship between predictor va 7. The number of observations 'n' must be greater than the number of parameters to be greater than the number of explanatory sample must not all be the same. Technically Var(X) must be a finite positive number. Lecturer in Statistics GPGC Mardan. 11 correlation between error term: There is no correlation between any two symbolically: COV( i , j )=0 No relationship between predictor variable and error term. E(ui,Xi)=0 The number of observations 'n' must be greater than the number of parameters to be estimated. Alternatively, the number of observations 'n' must be greater than the number of explanatory variables. The values of predictor variable (X) in a given sample must not all be the same. Technically Var(X) must be a finite positive number. Econometrics BS Economics There is no correlation between any two riable and error term. The number of observations 'n' must be greater than the number of , the number of observations 'n' must be tor variable (X) in a given sample must not all be the same. Technically Var(X) must be a finite positive number.
- 12. Muhammad Ali Lecturer in Statistics GPGC Mardan. M.sc (Peshawar University) Mphil(AIOU Islamabad) 8. The regression model is correctly specified. bias or error in the model used in empirical , there are no perfect linear relationships among the explanatory variables. Interpretation of Coe The interpretation of the coe estimated intercept (b0) tells us the value of y that is expected when x = 0. This is because many of our variables dont have true 0 values (or at least not income, which rarely if ever have 0 values). The slope (b1) is relationship between x and y. It is interpreted as the Properties of Least -Square Estimators: 1. The least square estimators are lin we have Lecturer in Statistics GPGC Mardan. 12 The regression model is correctly specified. Alternatively, there is no specification bias or error in the model used in empirical analysis. There is no perfect multicollinearity. That is , there are no perfect linear relationships among the explanatory variables. cients/Parameters cients from a linear regression model is fairly straig ) tells us the value of y that is expected when x = 0. This is because many of our variables dont have true 0 values (or at least not relevant ones have 0 values). The slope (b1) is more important, because it tells us the relationship between x and y. It is interpreted as the expected change in y for a one Square Estimators: The least square estimators are linear function of the actual observation on Y. = = n i XXi 1 )( ( )YY / = n i XXi 1 2)( = = n i 1 Yi (Xi--- )Xi -- )( XXiYi / = n i Xi 1 ( = = n i 1 Yi (Xi--- )Xi / = n i XXi 1 2)( ; as = / 2 = Econometrics BS Economics Alternatively, there is no specification is no perfect multicollinearity. That is , there are no perfect linear relationships among the explanatory variables. cients from a linear regression model is fairly straightforward. The ) tells us the value of y that is expected when x = 0. This is often not very useful, relevant ones-like education or important, because it tells us the expected change in y for a one-unit change in x. ear function of the actual observation on Y. XXi 2) = n i 1 (Xi--- )Xi = 0
- 13. Muhammad Ali Lecturer in Statistics GPGC Mardan. M.sc (Peshawar University) Mphil(AIOU Islamabad) Where Similarly we have Hence both and 2. The least square estimators we have Now = = = = n i n i n i xiwi 1 1 1 / = n i iw 1 2 = n i ii xw 1 Lecturer in Statistics GPGC Mardan. 13 Wi = xi / 2 Similarly we have xy = = / --( ) X = [ ]= n i iwXn 1 /1 Yi and , are expressed as linear function of the Y's. The least square estimators and , are unbiased estimators of and . = = n i iiYW 1 ^ == ++ n i iXiwi 1 )( = = = = ++ n i n i n i iiiii wxww 1 1 1 ------------ xi 1 2 = = == n i n i ii xxx 1 1 2 0/)( -----------( A ) = = == == = n i n i n i ii n i i xxxxxi 1 1 1 222 2 1 2 /1)/(/ ( )= = = = n i i i i i i i x x x x x 1 2 2 2 1 Econometrics BS Economics X , are expressed as linear function of the Y's. , are unbiased estimators of and . ------------( I ) ( A ) ix 2 ----------( B )
- 14. Muhammad Ali Lecturer in Statistics GPGC Mardan. M.sc (Peshawar University) Mphil(AIOU Islamabad) Substituting these values in equation ( I ) ^ (= = = = Taking expectation on both sides we g ^ E = = Then is an unbiased estimate of . Now By using equation A , B , and C. Taking expectation on both sides: Lecturer in Statistics GPGC Mardan. 14 n equation ( I ) )()0( xxw ii + + iiw [ ] iiiii wwxxw ++ [ ] iwi ++ 01 iwi + -----------( I ) Taking expectation on both sides we get ( )ii Ew + 0+ since E 0)( =i is an unbiased estimate of . = [ ] ii Yxwn /1 = [ ] )(/1 iixwxn ++ = ++ iiii xwxwx n x nn 111 = wix nn x n i + + 11 = ( ) ( ) ( )xxi n xn n ++ 10 11 By using equation A , B , and C. = iii wx n + 1 Taking expectation on both sides: Econometrics BS Economics iii wxx iwixwixix iwix
- 15. Muhammad Ali Lecturer in Statistics GPGC Mardan. M.sc (Peshawar University) Mphil(AIOU Islamabad) E( ) Thus is an unbiased estimator of Lecturer in Statistics GPGC Mardan. 15 = ( ) ( )iii EwxE n + 1 = 00 ++ since E( = is an unbiased estimator of . Econometrics BS Economics since E( 0) =i
- 16. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 16 Multiple Linear Regression Model Definition A linear regression model that involves more than one predictor variable is called multiple linear regression model. In this case the response variable is a linear function of two or more than two predictor variables. A multiple linear regression model with "p" predictor variables is given by: +++++= pp XXXoYi ...2211 i p ,...,, 10 are parameters and to be estimated from the sample data. These parameters are also called regression coefficients, the parameters )...3,2,1( pjj = represent the expected change in the response Y and percent change in Xj, where all the remaining predictor variables Xi's (ij) are held constant. For the reason these parameters are often called partial regression coefficients. A multiple linear regression model with two predictor variables is given by: iY = iXX +++ 22110 The corresponding regression model estimated from sample data is given as: 2 ^ 211 ^ 0 ^^ XXY ++= Where ^ 2 ^ 1 ^ 0 ^ ,, and are estimates of the parameters 0 , 1 , and 2 .
- 17. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 17 Ordinary Least Square Criteria to find Estimates The least square function S = e i 2 = = n i i YY 1 2^ = 2 22 ^ 11 ^ 0 ^ ++ XXYi = 2 22 ^ 11 ^ 0 ^ 2 XXYi --------------( I ) The function 'S' is to be minimized with respect to ^ 2 ^ 1 ^ 0 ^ ,, and . For this purpose we have to differentiate Equation ( I ) with respect to ^ 2 ^ 1 ^ 0 ^ ,, and . == = = = n i i n i i XXYe S 1 2 22 ^ 11 ^ 0 ^ 0 ^ 1 2 ^ 00 ^ 0 )( )( -------( II ) == = = = n i i n i i XXYe S 1 2 22 ^ 11 ^ 0 ^ 1 ^ 1 2 ^ 11 ^ 0 )( )( ------( III) == = = = n i i n i i XXYe S 1 2 22 ^ 11 ^ 0 ^ 2 ^ 1 2 ^ 22 ^ 0 )( )( ----( IV )
- 18. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 18 From ( II ) ( ) 012 1 ^ 22 ^ 11 ^ 0 = = n i i XXY = === = n i n i n i n i XXiYi 1 1 22 ^ 1 1 ^ 0 ^ 1 0 = === ++= n i n i n i n i XXiYi 1 1 22 ^ 1 1 ^ 0 ^ 1 )(22 ^ 11 ^^ 1 VXXnYi o n i ++== From ( III ) ( ) 02 122 ^ 11 ^ 0 ^ = XXXYi 0212 ^ 2 11 ^ 10 ^ 1 = XXXXXYi )(212 ^ 2 11 ^ 10 ^ 1 VIXXXXXYi ++= ( ) 02)( 222 ^ 11 ^ 0 ^ = XXXYIVFrom i 0 2 22 ^ 211 ^ 20 ^ 2 = XXXXXYi )( 2 22 ^ 211 ^ 20 ^ 2 VIIXXXXXYi ++= Equation ( V ), ( VI ), and ( VII ) are called normal equations. From ( V ) we get
- 19. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 19 nXnXnnnYi o n i //// 22 ^ 11 ^^ 1 ++== 22 ^ 11 ^ 0 ^ XXY ++= 22 ^ 11 ^ 0 ^ XXY = substituting value of 0 in equation ( VI) and ( VII) we get 212 ^ 2 11 ^ 122 ^ 11 ^ 1)( XXXXXXYXYVIFrom i ++ = 2 22 ^ 211 ^ 222 ^ 11 ^ 2)( XXXXXXYXYVIIFrom i ++ = [ ] [ ] AXXXXXXXXYXYi += 12212 ^ 11 2 11 ^ 11 [ ] [ ] BXXXXXXXXYXYi += 22 2 22 ^ 21211 ^ 22 Now ( )( )YYXXyx = 111 ( ) ( )YYXYYXyx = 111 ( ) ( ) 0sin,11 == YYceYYXyx 111 XYYXyx = 2 11 2 1 )( XXx = ( )1111 2 1 )( XXXXx =
- 20. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 20 ( ) ( )111111 2 1 XXXXXXx = 0)(sin, 0)(sin 2222 2 2 2 2 1111 2 1 2 1 == == XXceXXXx Similarly XXceXXXx ( )( ) ( ) ( ) ( )YYceXYYXyx YYXYYXyx YYXXyx = = = sin;222 222 222 Substituting these results in equation A and B we obtained the normal equations in deviation form as follow:1 Solving the above normal equations for 0 ^ and 1 ^ i.e. multiplying equation ( C ) by x2 2 an equation( d ) by x1x2 and subtract it, we will get the following estimates of 1. ( ) = 221 2 2 2 1 212 2 21 1 ^ xxxx xxyxxyx Similarly multiplying equation ( C ) by ' x1x2 ' and equation ( d ) by x1 2 and subtracting we will get ( ) = 221 2 2 2 1 211 2 12 2 ^ xxxx xxyxxyx
- 21. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 21 Standardized coefficients: In statistics, standardized coefficients or beta coefficients are the estimates resulting from an analysis carried out on independent variables that have been standardized so that their variances are. Therefore, standardized coefficients refer to how many standard deviations a dependent variable will change, per standard deviation increase in the predictor variable. Standardization of the coefficient is usually done to answer the question of which of the independent variables have a greater effect on the dependent variable in a multiple regression analysis, when the variables are measuredindifferent unitsof (forexample, income measuredin dollars and familysize measuredin numbe r of individuals). A regression carried out on original (unstandardized) variables produces unstandardized coefficients. A regression carried out on standardized variables produces standardized coefficients. Values for standardized and unstandardized coefficients can also be derived subsequent to either type of analysis. Before solving a multiple regression problem, all variables (independent and dependent) can be standardized. Each variable can be standardized by subtracting its mean from each of its values and then dividing these new values by the standard deviation of the variable. Standardizing all variables in a multiple regression yields standardized regression coefficients that show the change in the dependent variable measured in standard deviations. Advantages Standard coefficients' advocates note that the coefficients ignore the independent variable's scale of units, which makes comparisons easy.
- 22. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 22 Disadvantages Critics voice concerns that such a standardization can be misleading; a change of one standard deviation in one variable has no reason to be equivalent to a similar change in another predictor. Some variables are easy to affect externally, e.g., the amount of time spent on an action. Weight or cholesterol level are more difficult, and some, like height or age, are impossible to affect externally. Goodness of Fit (R2) The Coefficient of Determination, also known as R Squared, is interpreted as thegoodness of fit of a regression. The higher the coefficient of determination, the better the variance that the dependent variable is explained by the independent variable. The coefficient of determination is the overall measure of the usefulness of a regression. For example, if R2 is 0.95. This means that the variation in the regression is 95% explained by the independent variable. That is a good regression. Now, if the Coefficient of Determination, or R2 ,is 0.50. Its means that the variation in the regression is 50% explained by the independent variable. This is not a good regression. Note that R2 lies between '0' and '1'. If R2 =1, it means that the fitted model explains 100% of the variation in response variable 'Y'. On the other hand if R2 =0, the model does not explain any of the variation of 'Y'. The Coefficient of Determination can be calculated as the Regression sum of squares, RSS, divided by the total sum of squares,SST Coefficient of Determination = TSS RSS Mathematical formula of the coefficient of Determination is given as under:
- 23. Muhammad Ali Econometrics Lecturer in Statistics GPGC Mardan. BS Economics M.sc (Peshawar University) Mphil(AIOU Islamabad) 23 Problems with the coefficient of Determination First, let's consider that the Coefficient of Determination will increase as more independent variables are added. It does not matter if those independent variables help to explain the variation of the dependent variable, the R Square (Coefficient of Determination) will increase as more independent variables are added. This brings us to the concept of Adjusted R Squared. The adjusted R Squared takes into account only the independent variables that assist in explaining the variation of the dependent variable. The adjusted R Squared is different than the Coefficient of Determination, because the adjusted R Squared will only increase if the independent variables are helpful in an explanatory nature. The adjusted R Squared may be negative and must be lower than the original R Square (original Coefficient of Determination). ( )( ) ( )( ) ( )2 222 ^ 111 ^ 2 YY YYXXYYXX R i + =