research methods lecture 5 advanced stata

22
Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 [email protected]

Upload: elata

Post on 01-Feb-2016

66 views

Category:

Documents


3 download

DESCRIPTION

Research Methods Lecture 5 Advanced STATA. IAN WALKER Module Leader S2.109 [email protected]. Housekeeping announcement. Stephen Nickell (MPC and LSE) British Academy Keynes Lecture in Economics "Practical Issues in UK Monetary Policy 2000-2005" Wednesday 2nd November - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Research Methods Lecture 5 Advanced  STATA

Research Methods Lecture 5

Advanced STATA

IAN WALKERModule Leader

S2.109 [email protected]

Page 2: Research Methods Lecture 5 Advanced  STATA

Housekeeping announcement

• Stephen Nickell (MPC and LSE) – British Academy Keynes Lecture in Economics– "Practical Issues in UK Monetary Policy 2000-

2005"

–Wednesday 2nd November

– Arts Centre Conference Room at 5.30pm – http://www2.warwick.ac.uk/fac/soc/economics/

forums/deptsems/keynes_lecture/

Page 3: Research Methods Lecture 5 Advanced  STATA

Stat-Transfer• Use STAT-

TRANSFER to convert data.

• Click on• Stat-transfer is

“point and click”.

• Just tell it the file name and format

• and the format you want it in.

• Click “transfer”.

Stat Tran 6.lnk

Page 4: Research Methods Lecture 5 Advanced  STATA

Stat Transfer options• Useful options for creating a manageable

dataset from a large one:– Keep or drop variables– Change variable format

• E.g. float to integer

– Select observations• E.g. “where (income + benefits)/famsize < 4500”

• Can be used for reading a large STATA dataset and writing a smaller one

• Avoids doing this in STATA itself

Page 5: Research Methods Lecture 5 Advanced  STATA

Practicising• You can import some of Stata’s own demo files

using the .sysuse command– E.g. .sysuse auto

• Many datasets are available at specific websites– E.g. STATA’s own site has all the demo data used in

the manual examples

• You can use the .webuse command to load the files directly into stata without copying locally.webuse auto /* gets the data from STATA’s own site */Or .webuse set http://www2.warwick.ac.uk/fac/soc/

economics/pg/modules/rm/notes/auto.dta

Page 6: Research Methods Lecture 5 Advanced  STATA

More help• You can search the whole of STATA’s online help

using .search xxx• Michigan’s web-based guide to STATA (for SA)• UCLA resources to help you learn and use STATA:

– including movies and “web-books”• Consult other user-written guides and tutorials

– Chevalier1, Chevalier2; Princeton; Illinois; Gruhn• ESDS’s “Stata for LFS”• Stata’s own resources for learning STATA

– Stata website, journal, library, archive– http://www.stata.com/links/resources1.html

Page 7: Research Methods Lecture 5 Advanced  STATA

Web resources• STATA is web-aware

– E.g. . update /* updates from www.stata.com */

• Statalist is an email listserv discussion group• The Stata Journal is a refereed journal

– Replaces the old Stata Technical Bulletin (STB):

• SSC Boston College STATA Archive – Extensive library of programs by Stata users– Files can be downloaded in Stata using . ssc

• Eg .ssc install outreg • Installs the outreg ado file that makes tables pretty

Page 8: Research Methods Lecture 5 Advanced  STATA

Always (whatever the software)

• Use lowercase• Open a log file• Label your data• Use the do file editor• Organise your files

– Separate directories for separate projects– Archive (zip) data, do and results files

when your finished

Page 9: Research Methods Lecture 5 Advanced  STATA

Customising STATA• profile.do runs automatically when STATA

starts• Edit it to include commands you want to

invoke every time.set mem 200m.log using justincase.log, replace

• Define preferences for STATA’s look and feel– Click on Prefs in menu

• Colours, graph scheme, etc.• Save window positioning

Page 10: Research Methods Lecture 5 Advanced  STATA

Regression models - I• Linear regression and related models when

the outcome variable is continuous– OLS, 2SLS, 3SLS, IV, quantile reg, Box-Cox …

• Binary outcome data– the outcome variable is 0 or 1(or y/n)

• probit, logit, nested logit...;

• Multiple outcome data– the outcome variable is 1, 2, ...,

• conditional logit, ordered probit

Page 11: Research Methods Lecture 5 Advanced  STATA

Regression models - II• Count data

– the outcome variable is 0, 1, 2, ..., occurrences • Poisson regression, negative binomial

• Choice models– multinomial choice– A, B or C

• Multinomial logit, Random utility model, unordered probit, nested logit, ...etc

• Selection models– Truncated, censored

• Tobit, Heckman selection models; • linear regression or probit with selection

Page 12: Research Methods Lecture 5 Advanced  STATA

Regression models - III• STATA supports several special data types.• Once type is defined special commands work• Time series

– Estimate ARIMA, and ARCH models– Estimators for autocorrelation and heteroscedasticity– Estimate MA and other smoothers– Tests for auto, het, unit roots - h, d, LM, Q, ADF, P-P …..– TS graphs sysuse tsline2, clear tsset day tsline calories, ttick(28nov2002 25dec2002 , tpos(in)) ttext(3470 28Nov2002 “Thanks" 3470 25dec2002 “Xmas"",orient(vert))

Page 13: Research Methods Lecture 5 Advanced  STATA

…gives

than

ks

x-m

as

3400

3600

3800

4000

4200

4400

Ca

lorie

s co

nsu

med

01jan2002 01apr2002 01jul2002 01oct2002 01jan2003Date

Page 14: Research Methods Lecture 5 Advanced  STATA

Special data types: survey

• Non-randomness induces OLS to be inefficient

• STATA can handle non-random survey data– see the “syv***” commands– Example (stratified sample of medical cases):

. webuse nhanes2f, clear

. svyset psuid [pweight=finalwgt], strata(stratid)

. svy: reg zinc age age2 weight female black orace rural

. reg zinc age age2 weight female black orace rural

Page 15: Research Methods Lecture 5 Advanced  STATA

Number of strata = 31 Number of obs = 9189 Number of PSUs = 62 Population size = 1.042e+08 Design df = 31 F( 7, 25) = 62.50 Prob > F = 0.0000 R-squared = 0.0698 ------------------------------------------------------------------------------ | Linearized zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.1701161 .0844192 -2.02 0.053 -.3422901 .002058 age2 | .0008744 .0008655 1.01 0.320 -.0008907 .0026396 weight | .0535225 .0139115 3.85 0.001 .0251499 .0818951 female | -6.134161 .4403625 -13.93 0.000 -7.032286 -5.236035 black | -2.881813 1.075958 -2.68 0.012 -5.076244 -.687381 orace | -4.118051 1.621121 -2.54 0.016 -7.424349 -.8117528 rural | -.5386327 .6171836 -0.87 0.390 -1.797387 .7201216 _cons | 92.47495 2.228263 41.50 0.000 87.93038 97.01952 ------------------------------------------------------------------------------ . regress zinc age age2 weight female black orace rural Source | SS df MS Number of obs = 9189 -------------+------------------------------ F( 7, 9181) = 79.72 Model | 110417.827 7 15773.9753 Prob > F = 0.0000 Residual | 1816535.3 9181 197.85811 R-squared = 0.0573 -------------+------------------------------ Adj R-squared = 0.0566 Total | 1926953.13 9188 209.724982 Root MSE = 14.066 ------------------------------------------------------------------------------ zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.090298 .0638452 -1.41 0.157 -.2154488 .0348528 age2 | -.0000324 .0006788 -0.05 0.962 -.0013631 .0012983 weight | .0606481 .0105986 5.72 0.000 .0398725 .0814237 female | -5.021949 .3194705 -15.72 0.000 -5.648182 -4.395716 black | -2.311753 .5073536 -4.56 0.000 -3.306279 -1.317227 orace | -3.390879 1.060981 -3.20 0.001 -5.470637 -1.311121 rural | -.0966462 .3098948 -0.31 0.755 -.7041089 .5108166 _cons | 89.49465 1.477528 60.57 0.000 86.59836 92.39093

Page 16: Research Methods Lecture 5 Advanced  STATA

Special data types: duration

• Survival time data– See the “st***” commands

.stset failtime /*sets the var that defines duration*/

• Estimates a wide variety of models to explain duration– E.g. Weibull “hazard” model -

Page 17: Research Methods Lecture 5 Advanced  STATA

Weibull example ….

twoway (function y = .5*x^(-.5), range(0 5) yvarlab("a=.5") )( function y = 1.5*x^(.5), range(0 5) yvarlab("a=1.5") )( function y = 1*x^(0), range(0 5) yvarlab("a=1") )( function y = 2*x, range(0 2) yvarlab("a=2") ), saving(weib1, replace)title("Weibull hazard: lambda=1, alpha varying")ytitle(hazard) xtitle(t)

• ST regression supports Weibull, Cox PH and other options. streg load bearings, distribution(weibull)

• After streg you can plot bthe estimated hazard with . stcurve, cumhaz• STATA allows functions to be plotted by specifying the

function:

Page 18: Research Methods Lecture 5 Advanced  STATA

gives…..0

12

34

haza

rd

0 1 2 3 4 5t

a=.5 a=1.5a=1 a=2

Weibull hazard: lambda=1, alpha varying

Page 19: Research Methods Lecture 5 Advanced  STATA

Special data types: Panel data

• STATA can handle “panel” data easily– see the “xt***” commands

• Common commands are.xtdes Describe pattern of xt data

.xtsum Summarize xt data

.xttab Tabulate xt data

.xtline Line plots with xt data

.xtreg Fixed and random effects

Page 20: Research Methods Lecture 5 Advanced  STATA

Panel data• An xt dataset looks like this: pid yr_visit fev age sex height smokes ---------------------------------------------------------- 1071 1991 1.21 25 1 69 0 1071 1992 1.52 26 1 69 0 1071 1993 1.32 28 1 68 0 1072 1991 1.33 18 1 71 1 1072 1992 1.18 20 1 71 1 1072 1993 1.19 21 1 71 0

• xt*** commands need to know the variables that identify person and “wave”:

. iis pid . tis yr_visit

Or use the tsset command. tsset pid yr_visit, yearly

Page 21: Research Methods Lecture 5 Advanced  STATA

Panel regression

• Once STATA has been told how to read the data it can perform regressions quite quickly:. xtreg y x, fe

. xtreg y x, re

Page 22: Research Methods Lecture 5 Advanced  STATA

Further advice

• See Stephen Jenkins’ excellent course on duration modelling in STATA

• See Steve Pudney’s excellent course on panel data modelling in STATA– Beware the dataset is 30mb+