introduction into stata iii: graphs and regressions

27
Introduction into STATA III: Graphs and Regressions Prof. Dr. Herbert Brücker University of Bamberg minar “Migration and the Labour Market June 27, 2013

Upload: cairbre-troy

Post on 31-Dec-2015

56 views

Category:

Documents


3 download

DESCRIPTION

Introduction into STATA III: Graphs and Regressions. Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” June 27, 2013. 1GRAPHS Present your data graphically - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction into STATA III:  Graphs and Regressions

Introduction into STATA III: Graphs and Regressions

Prof. Dr. Herbert Brücker

University of Bamberg

Seminar “Migration and the Labour Market”June 27, 2013

Page 2: Introduction into STATA III:  Graphs and Regressions

1 GRAPHS

• Present your data graphically• It is usually helpful if you present the main

information /vairables in your data set graphically• There are many graphical commands, use the

Graphics menue• the simplest way is to show the development of your

variable(s) over time• Syntax:

• graph twoway line [variable1] [variable2] if …• graph twoway line wqjt year if ed==1 & ex == 1

• This produces a two-dimensional variable with the wage on the vertical and the year on the horizontal axis for education group 1 and experience group 1

Page 3: Introduction into STATA III:  Graphs and Regressions

Making a graph

Page 4: Introduction into STATA III:  Graphs and Regressions

Graph of mean wage in education 1 and experience 1 group

Page 5: Introduction into STATA III:  Graphs and Regressions

Graph of migration rate in edu 1 and exp 1 group

Page 6: Introduction into STATA III:  Graphs and Regressions

GRAPHS: Two Y-axes

• Two axes: It might be useful to display two variables in different y-axes with different scales (e.g. wages and migration rates)

• Syntax:• graph (twoway line [variable1] [variable2],

yaxis(1)) (twoway line [variable3] [variable2], yaxis(2)) if …

• graph (twoway line wqjt year, yaxis(1)) (twoway line mqjt year, yaxis(2)) if ed==1 & ex == 1

• This produces a two-dimensional graph with the wage on the first vertical axis (y1) and the migration rate on the second vertical axis (y2)

Page 7: Introduction into STATA III:  Graphs and Regressions

GRAPHS: Scatter plots (I/II)

• Scatter plots display the relations between two variables

• Syntax:• graph twoway scatter [variable1] [variable2] if …• graph twoway scatter wqjt mqjt if ed==1

• This produces a two-dimensional scatter plot which shows the relation between the two variables

Page 8: Introduction into STATA III:  Graphs and Regressions

GRAPHS: Scatter plots (II/II)

• You can also add a linear fitted line:• Syntax:

• graph twoway scatter [variable1] [variable2] if …|| lfit [variable1] [variable2] if …

• graph twoway scatter wqjt mqjt if ed==1|| lfit wqjt mqjt if ed==1

Page 9: Introduction into STATA III:  Graphs and Regressions

2 Running regressions

• The standard OLS regression command in STATA is• Syntax

• regress depvar [list of indepvar ] [if], [options] • e.g. regress ln_wijt mijt $D_i $D_j $D_t

Page 10: Introduction into STATA III:  Graphs and Regressions

The multivariate linear regression model

The general econometric model:

γi indicates the dependent (or: endogenous) variable

x1i,ki exogenous variable, explaining the independent variable

β0 constant or the y-axis intercept (if x = 0)

β1,2,k regression coefficient or parameter of regression

εi residual, disturbance term

Page 11: Introduction into STATA III:  Graphs and Regressions

Running a regression model

Regressioncommand

Dependentvariable

Independentvariables

Globals !

Page 12: Introduction into STATA III:  Graphs and Regressions

Running a Regression: Output

Page 13: Introduction into STATA III:  Graphs and Regressions

How to interpret the output of a regression

1. Observations2. fit of the model3. F-Test 4. R-squared5. adjusted R-squared6. Root Mean Standard Error

analysis of significance levels

variance of model

β0

β1

degrees offreedom

95% confidence interval

_cons 4.706176 .017403 270.42 0.000 4.672015 4.740337 mqkt -1.369118 .093913 -14.58 0.000 -1.553464 -1.184772 ln_wqkt Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 111.329246 799 .139335727 Root MSE = .33192 Adj R-squared = 0.2093 Residual 87.9145738 798 .110168639 R-squared = 0.2103 Model 23.4146717 1 23.4146717 Prob > F = 0.0000 F( 1, 798) = 212.53 Source SS df MS Number of obs = 800

. reg ln_wqkt mqkt

Page 14: Introduction into STATA III:  Graphs and Regressions

Recall the Borjas (2003)-Modell

yijt = β mijt + si + xj + tt + (si ∙ xj) + (si ∙ tt) + (xj ∙ tt) + εijt

This model in STATA Syntax:

regress ln_wqjt mqjt $Di $Dj $Dt $Dij $Dit $Djt

where• ln_wqjt: dependent variable (log wage)• mqjt: migration share in educatipn-experience cell• $Di: global for education dummies• $Dj: global for experience dummies• $Dt: global for time dummies• $Dij: global for interaction education-experience dummies• $Dit: global for education-time interaction dummies• $Djt: global for experience-time interaction dummies

Page 15: Introduction into STATA III:  Graphs and Regressions

What is a global?

• A global defines a vector of variables• Defining a global:

STATA Syntax:global [global name] [variable1] [variable2] …[variablex]global Di Ded1 Ded2 Ded3

Using a global e.g. in a regression:regress [depvariable] [other variable] [$global name]regress ln_wqjt mqjt $Di

This is equivalent to: regress ln_wqjt mqjt Ded1 Ded2 Ded3

Thus, globals are useful shortcuts for lists (vectors) of variables.

Page 16: Introduction into STATA III:  Graphs and Regressions

An alternative to the Borjas (2003) model:

yijkt = β mijt + γk (zk ∙ mijt) + si + xj + zk + tt + (si ∙ xj) + (si ∙ zk) + (xj ∙ zk) + (si ∙ tt) + (xj ∙ tt) + (zk ∙ tt) + εijt

where• zk is a dummy for foreigners (1 if foreigner, 0 if native)• γk is a coefficient, which captures the different impact on foreigners,• k (k= 0, 1) is a subscript for nationality

Idea: the slope coefficient γk is significantly different from zero, if natives and immigrants are imperfect substitutes in the labour market.

Problem: We have to reorganize the dataset such that it delivers the wage and unemployment rates etc. for foreigners and natives.

Page 17: Introduction into STATA III:  Graphs and Regressions

3 Panel Models

• Very often you use panel models, i.e. models which have a group and time series dimension

• There exist special estimators for this, e.g. fixed or random effects models• A fixed effects model is a model where you have a

fixed (constant) effect for each individual/group. This is equivalent to a dummy variable for each group

• A random effects model is a model where you have a random effect for each individual group, which is based on assumptions on the

distribution of individual effects

Page 18: Introduction into STATA III:  Graphs and Regressions

Panel Models

Preparing data for Panel Models:• For running panel models STATA needs to identify the

group(individual) and time series dimension• Therefore you need an index for each group and an

index for each time period• Then use the tsset command to organize you dataset

as a panel data set• Syntax:

• tsset index year• where index is the group/individual index and year

the time index

Page 19: Introduction into STATA III:  Graphs and Regressions

Preparation: Running the tsset command

Page 20: Introduction into STATA III:  Graphs and Regressions

Running Regressions: Panel Models

• Then you can use panel estimators, e.g. the xtreg estimator

• Syntax• xtregress depvar [list of indepvar ] [if], [options] • xtregress ln_wijt m_ijt, fe

• i.e. in the example we run a simple fixed effects panel regression model which is equivalent to include a dummy variable for each group (in this case education-experience group)

Page 21: Introduction into STATA III:  Graphs and Regressions

Running a Panel Regression: command

Page 22: Introduction into STATA III:  Graphs and Regressions

Running a Panel Regression: Output

Page 23: Introduction into STATA III:  Graphs and Regressions

Running Regressions: Panel Models

• There are other features of panel estimators which are helpful

• Heteroscedasticity:• Heteroscedasticity: the variance is not constant,

but varies across groups• xtpcse , p(h) corrects for heteroscedastic standard

errors• xtgls , p(h) corrects coefficient and standard

errors for panel heteroscedasticity, but may produce biased results depending on the group and time dimension of the panel

• Note: p(h) after the comma is a so-called “option” in the STATA syntax

Page 24: Introduction into STATA III:  Graphs and Regressions

Heteroscedasticity within a group

Y

x

Page 25: Introduction into STATA III:  Graphs and Regressions

Heteroscedasticity in panel models across groups

Y

x

Page 26: Introduction into STATA III:  Graphs and Regressions

Running Regressions: Panel Models

• Contemporary correlation across cross-sections• Contemporary correlation: the error terms are

contemporarily correlated across cross-sections, e.g. due to macroeconomic disturbances

• xtgls , p(c) corrects for contemporary correlation and panel heteroscedasticity, but may produce biased results depending on the group and time dimension of the panel.

Page 27: Introduction into STATA III:  Graphs and Regressions

Next Meeting

July 4!

Presentation: July 18

Room RZ 01.02