research methods lecture 5 advanced stata ian walker module leader s2.109 [email protected]
TRANSCRIPT
![Page 2: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/2.jpg)
Housekeeping announcement
• Stephen Nickell (MPC and LSE) – British Academy Keynes Lecture in Economics– "Practical Issues in UK Monetary Policy 2000-
2005"
–Wednesday 2nd November
– Arts Centre Conference Room at 5.30pm – http://www2.warwick.ac.uk/fac/soc/economics/
forums/deptsems/keynes_lecture/
![Page 3: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/3.jpg)
Stat-Transfer• Use STAT-
TRANSFER to convert data.
• Click on• Stat-transfer is
“point and click”.
• Just tell it the file name and format
• and the format you want it in.
• Click “transfer”.
Stat Tran 6.lnk
![Page 4: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/4.jpg)
Stat Transfer options• Useful options for creating a manageable
dataset from a large one:– Keep or drop variables– Change variable format
• E.g. float to integer
– Select observations• E.g. “where (income + benefits)/famsize < 4500”
• Can be used for reading a large STATA dataset and writing a smaller one
• Avoids doing this in STATA itself
![Page 5: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/5.jpg)
Practicising• You can import some of Stata’s own demo files
using the .sysuse command– E.g. .sysuse auto
• Many datasets are available at specific websites– E.g. STATA’s own site has all the demo data used in
the manual examples
• You can use the .webuse command to load the files directly into stata without copying locally.webuse auto /* gets the data from STATA’s own site */Or .webuse set http://www2.warwick.ac.uk/fac/soc/
economics/pg/modules/rm/notes/auto.dta
![Page 6: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/6.jpg)
More help• You can search the whole of STATA’s online help
using .search xxx• Michigan’s web-based guide to STATA (for SA)• UCLA resources to help you learn and use STATA:
– including movies and “web-books”• Consult other user-written guides and tutorials
– Chevalier1, Chevalier2; Princeton; Illinois; Gruhn• ESDS’s “Stata for LFS”• Stata’s own resources for learning STATA
– Stata website, journal, library, archive– http://www.stata.com/links/resources1.html
![Page 7: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/7.jpg)
Web resources• STATA is web-aware
– E.g. . update /* updates from www.stata.com */
• Statalist is an email listserv discussion group• The Stata Journal is a refereed journal
– Replaces the old Stata Technical Bulletin (STB):
• SSC Boston College STATA Archive – Extensive library of programs by Stata users– Files can be downloaded in Stata using . ssc
• Eg .ssc install outreg • Installs the outreg ado file that makes tables pretty
![Page 8: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/8.jpg)
Always (whatever the software)
• Use lowercase• Open a log file• Label your data• Use the do file editor• Organise your files
– Separate directories for separate projects– Archive (zip) data, do and results files
when your finished
![Page 9: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/9.jpg)
Customising STATA• profile.do runs automatically when STATA
starts• Edit it to include commands you want to
invoke every time.set mem 200m.log using justincase.log, replace
• Define preferences for STATA’s look and feel– Click on Prefs in menu
• Colours, graph scheme, etc.• Save window positioning
![Page 10: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/10.jpg)
Regression models - I• Linear regression and related models when
the outcome variable is continuous– OLS, 2SLS, 3SLS, IV, quantile reg, Box-Cox …
• Binary outcome data– the outcome variable is 0 or 1(or y/n)
• probit, logit, nested logit...;
• Multiple outcome data– the outcome variable is 1, 2, ...,
• conditional logit, ordered probit
![Page 11: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/11.jpg)
Regression models - II• Count data
– the outcome variable is 0, 1, 2, ..., occurrences • Poisson regression, negative binomial
• Choice models– multinomial choice– A, B or C
• Multinomial logit, Random utility model, unordered probit, nested logit, ...etc
• Selection models– Truncated, censored
• Tobit, Heckman selection models; • linear regression or probit with selection
![Page 12: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/12.jpg)
Regression models - III• STATA supports several special data types.• Once type is defined special commands work• Time series
– Estimate ARIMA, and ARCH models– Estimators for autocorrelation and heteroscedasticity– Estimate MA and other smoothers– Tests for auto, het, unit roots - h, d, LM, Q, ADF, P-P …..– TS graphs sysuse tsline2, clear tsset day tsline calories, ttick(28nov2002 25dec2002 , tpos(in)) ttext(3470 28Nov2002 “Thanks" 3470 25dec2002 “Xmas"",orient(vert))
![Page 13: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/13.jpg)
…gives
than
ks
x-m
as
3400
3600
3800
4000
4200
4400
Ca
lorie
s co
nsu
med
01jan2002 01apr2002 01jul2002 01oct2002 01jan2003Date
![Page 14: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/14.jpg)
Special data types: survey
• Non-randomness induces OLS to be inefficient
• STATA can handle non-random survey data– see the “syv***” commands– Example (stratified sample of medical cases):
. webuse nhanes2f, clear
. svyset psuid [pweight=finalwgt], strata(stratid)
. svy: reg zinc age age2 weight female black orace rural
. reg zinc age age2 weight female black orace rural
![Page 15: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/15.jpg)
Number of strata = 31 Number of obs = 9189 Number of PSUs = 62 Population size = 1.042e+08 Design df = 31 F( 7, 25) = 62.50 Prob > F = 0.0000 R-squared = 0.0698 ------------------------------------------------------------------------------ | Linearized zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.1701161 .0844192 -2.02 0.053 -.3422901 .002058 age2 | .0008744 .0008655 1.01 0.320 -.0008907 .0026396 weight | .0535225 .0139115 3.85 0.001 .0251499 .0818951 female | -6.134161 .4403625 -13.93 0.000 -7.032286 -5.236035 black | -2.881813 1.075958 -2.68 0.012 -5.076244 -.687381 orace | -4.118051 1.621121 -2.54 0.016 -7.424349 -.8117528 rural | -.5386327 .6171836 -0.87 0.390 -1.797387 .7201216 _cons | 92.47495 2.228263 41.50 0.000 87.93038 97.01952 ------------------------------------------------------------------------------ . regress zinc age age2 weight female black orace rural Source | SS df MS Number of obs = 9189 -------------+------------------------------ F( 7, 9181) = 79.72 Model | 110417.827 7 15773.9753 Prob > F = 0.0000 Residual | 1816535.3 9181 197.85811 R-squared = 0.0573 -------------+------------------------------ Adj R-squared = 0.0566 Total | 1926953.13 9188 209.724982 Root MSE = 14.066 ------------------------------------------------------------------------------ zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.090298 .0638452 -1.41 0.157 -.2154488 .0348528 age2 | -.0000324 .0006788 -0.05 0.962 -.0013631 .0012983 weight | .0606481 .0105986 5.72 0.000 .0398725 .0814237 female | -5.021949 .3194705 -15.72 0.000 -5.648182 -4.395716 black | -2.311753 .5073536 -4.56 0.000 -3.306279 -1.317227 orace | -3.390879 1.060981 -3.20 0.001 -5.470637 -1.311121 rural | -.0966462 .3098948 -0.31 0.755 -.7041089 .5108166 _cons | 89.49465 1.477528 60.57 0.000 86.59836 92.39093
![Page 16: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/16.jpg)
Special data types: duration
• Survival time data– See the “st***” commands
.stset failtime /*sets the var that defines duration*/
• Estimates a wide variety of models to explain duration– E.g. Weibull “hazard” model -
![Page 17: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/17.jpg)
Weibull example ….
twoway (function y = .5*x^(-.5), range(0 5) yvarlab("a=.5") )( function y = 1.5*x^(.5), range(0 5) yvarlab("a=1.5") )( function y = 1*x^(0), range(0 5) yvarlab("a=1") )( function y = 2*x, range(0 2) yvarlab("a=2") ), saving(weib1, replace)title("Weibull hazard: lambda=1, alpha varying")ytitle(hazard) xtitle(t)
• ST regression supports Weibull, Cox PH and other options. streg load bearings, distribution(weibull)
• After streg you can plot bthe estimated hazard with . stcurve, cumhaz• STATA allows functions to be plotted by specifying the
function:
![Page 18: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/18.jpg)
gives…..0
12
34
haza
rd
0 1 2 3 4 5t
a=.5 a=1.5a=1 a=2
Weibull hazard: lambda=1, alpha varying
![Page 19: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/19.jpg)
Special data types: Panel data
• STATA can handle “panel” data easily– see the “xt***” commands
• Common commands are.xtdes Describe pattern of xt data
.xtsum Summarize xt data
.xttab Tabulate xt data
.xtline Line plots with xt data
.xtreg Fixed and random effects
![Page 20: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/20.jpg)
Panel data• An xt dataset looks like this: pid yr_visit fev age sex height smokes ---------------------------------------------------------- 1071 1991 1.21 25 1 69 0 1071 1992 1.52 26 1 69 0 1071 1993 1.32 28 1 68 0 1072 1991 1.33 18 1 71 1 1072 1992 1.18 20 1 71 1 1072 1993 1.19 21 1 71 0
• xt*** commands need to know the variables that identify person and “wave”:
. iis pid . tis yr_visit
Or use the tsset command. tsset pid yr_visit, yearly
![Page 21: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/21.jpg)
Panel regression
• Once STATA has been told how to read the data it can perform regressions quite quickly:. xtreg y x, fe
. xtreg y x, re
![Page 22: Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022081504/551ae17f550346b2288b63a1/html5/thumbnails/22.jpg)
Further advice
• See Stephen Jenkins’ excellent course on duration modelling in STATA
• See Steve Pudney’s excellent course on panel data modelling in STATA– Beware the dataset is 30mb+