advanced stata workshop

59
Advanced Stata Workshop FHSS Research Support Center

Upload: vonda

Post on 25-Feb-2016

71 views

Category:

Documents


2 download

DESCRIPTION

Advanced Stata Workshop. FHSS Research Support Center. Presentation Layout. Visualization and Graphing Macros and Looping Panel and Survey Data Postestimation. Visualization and Graphing in Stata. Intro To Graphing In Stata. “graph” is often optional. So is “ twoway ” in this case. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced  Stata  Workshop

Advanced Stata Workshop

FHSS Research Support Center

Page 2: Advanced  Stata  Workshop

Presentation Layout

• Visualization and Graphing• Macros and Looping• Panel and Survey Data• Postestimation

Page 3: Advanced  Stata  Workshop

Visualization and Graphing in Stata

5560

6570

7580

Life

exp

ecta

ncy

at b

irth

0.1.2.3Fraction

5560

6570

7580

Life

exp

ecta

ncy

at b

irth

2.5 3 3.5 4 4.5loggnp

0.0

5.1

.15

.2Fr

actio

n

2.5 3 3.5 4 4.5loggnp

Source: 1998 data from The World Bank Group

Life expectancy at birth vs. GNP per capita

Page 4: Advanced  Stata  Workshop

Intro To Graphing In Stata10

2030

40M

ileag

e (m

pg)

0 5,000 10,000 15,000Price

“graph” is often optional. So is “twoway” in this case.

. sysuse auto, clear

. graph twoway scatter mpg weight //Note that you don't need to type graph or twoway

. scatter mpg weight

Note: Nearly all graphing commands start with “graph”, and “twoway” is a large family of graphs.

Page 5: Advanced  Stata  Workshop

Creating Multiple Graphs with “by():”. twoway scatter mpg weight, by(foreign)

1020

3040

2,000 3,000 4,000 5,000 2,000 3,000 4,000 5,000

Domestic Foreign

Mile

age

(mpg

)

Weight (lbs.)Graphs by Car type

Note that the value label is displayed above the graphs, and the variable label is displayed in the bottom right hand corner.

Page 6: Advanced  Stata  Workshop

Overlaying “twoway” graphs

The || tells Stata to put the second graph on top of the first one – order matters! You don’t need to type “twoway” twice; it applies to both.

. twoway scatter mpg weight || lfit mpg weight

1020

3040

2,000 3,000 4,000 5,000Weight (lbs.)

Mileage (mpg) Fitted values

. twoway (scatter mpg weight) (lfit mpg weight)

1020

3040

2,000 3,000 4,000 5,000Weight (lbs.)

Mileage (mpg) Fitted values

This is another way of writing the command – it doesn’t matter which one you use.

Page 7: Advanced  Stata  Workshop

. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)

"by()" statements with overlaid graphs

“qfitci” is a type of graph which plots the prediction line from a quadratic regression, and adds a confidence interval. The “stdf” option specifies that the confidence interval be created on the basis

. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)

010

2030

40

2000 3000 4000 5000 2000 3000 4000 5000

Domestic Foreign

95% CI Fitted valuesMileage (mpg)

Weight (lbs.)

Graphs by Car type

stdf is an option of qfitci. by(foreign) is an option of twoway.

Page 8: Advanced  Stata  Workshop

"by()" statements with overlaid graphsAnother way of writing the previous command is:

010

2030

40

2000 3000 4000 5000 2000 3000 4000 5000

Domestic Foreign

95% CI Fitted valuesMileage (mpg)

Weight (lbs.)

Graphs by Car type

. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)

. twoway (qfitci mpg weight, stdf) (scatter mpg weight), by(foreign)

. twoway qfitci mpg weight, stdf || scatter mpg weight ||, by(foreign)

So: This was is easier to read.

This way is easier to type.

Page 9: Advanced  Stata  Workshop

Graphs with Many Options and OverlaysYou can make pretty impressive graphs just from code, if you overlay the graphs and specify certain options like: multiple axes, notes, titles and subtitles, axis titles and labels, and legends.

Page 10: Advanced  Stata  Workshop

Code for Previous Graph

. #delimit ;

. #delimit cr

> legend(label(1 "White males") label(2 "Black males") );> "(1918 dip caused by 1918 Influenza Pandemic)" )> note( "Source: National Vital Statistics, Vol 50, No. 6" > subtitle( "USA, 1900-1999" ) > title( "White and black life expectancy" ) > ytitle( "Life expectancy at birth (years)" ) > ylabel( 0 20(10)80, gmax angle(horizontal) ) > ylabel( 0(5)20, axis(2) grid gmin angle(horizontal) ) > xlabel( 1918, axis(2) ) > xtitle( "", axis(2) ) > ytitle( "", axis(2) ) > ||, > || lfit diff year > || line diff year > || line le_bm year . twoway line le_wm year, yaxis(1 2) xaxis(1 2)

. use http://www.stata-press.com/data/r12/uslifeexp, clear

. generate diff = le_wm - le_bm

. label var diff "Difference"

. #delimit cr

This may look scary, but it is actually fairly straightforward. See the accompanying do-file for explanation of each component.

Page 11: Advanced  Stata  Workshop

68

1012

14

01oct2009 01jan2010 01apr2010 01jul2010date

NASDAQ Composite Index ABC.com, Inc. share price

Using the Graph Editor

. tsline nci abc

It is often easier to make changes in the graph editor than to specify all the options in code.

Let’s make graph 1 into graph 2 by using the graph editor tools.

0

2

4

6

8

10

12

14

16

Sha

re P

rice

(US

D)

Oct 1, 2009Nov 1, 2009

Dec 1, 2009Jan 1, 2010

Feb 1, 2010Mar 1, 2010

Apr 1, 2010May 1, 2010

Jun 1, 2010

NASDAQ Composite Index ABC.com, Inc. share price

Source: CRSP, Bloomberg

Sep 24, 2009 - June 7, 2010

ABC.com Inc.Closing Share Price vs. Nasdaq Composite Index

Page 12: Advanced  Stata  Workshop

Recording Edits in the Graph Editor

Graph Element ChangeGraph Title Enter Title using quotes to separate lines, color=black

Graph Subtitle Enter subtitle

Graph Region Color = Bluish-gray

Y-AxisRange = 0 to 16 by 2, axis line = medium thick, add title, label angle = horizontal, grid lines = off

X- Axis

title = off, minor ticks = off, suggest # of ticks = 8, alternate spacing of adjacent labels = on, change label format, label size=small, axis line = medium thick

Plot 1 line color=green, width = thick

Plot 2 line color = blue, width = thick

Caption Add caption

Before you start making changes, click the record button. After you are done, click it again, and save your changes as a recording so you can “play” them back later. We will save this recording as advanced_workshop_1.

Page 13: Advanced  Stata  Workshop

Play Your Graph Recording

. tsline nci abc, play(advanced_workshop_1)

You can create a graph, open the graph editor, click the green play button, and then play back your recorded edits.

Or, you can play your edits right from the code:

You can also run all of your recorded edits on a different graph, and just change the title:

0

2

4

6

8

10

12

14

16

Sha

re P

rice

(US

D)

Oct 1, 2009Nov 1, 2009

Dec 1, 2009Jan 1, 2010

Feb 1, 2010Mar 1, 2010

Apr 1, 2010May 1, 2010

Jun 1, 2010

Computer World share price Computer Planet share price

Source: CRSP, Bloomberg

Sep 24, 2009 - June 7, 2010

ABC.com Inc.Closing Share Price vs. Nasdaq Composite Index

. tsline comp_world comp_planet , play(advanced_workshop_1)

You can run your recorded edits on a graph of a different type, though in this case not all of your edits will make sense:

0

2

4

6

8

10

12

14

16

Sha

re P

rice

(US

D)

Oct 1, 2009Nov 1, 2009

Dec 1, 2009Jan 1, 2010

Feb 1, 2010Mar 1, 2010

Apr 1, 2010May 1, 2010

Jun 1, 2010

NASDAQ Composite Index ABC.com, Inc. share price

Source: CRSP, Bloomberg

Sep 24, 2009 - June 7, 2010

ABC.com Inc.Closing Share Price vs. Nasdaq Composite Index

> , play(advanced_workshop_1). twoway (scatter nci date) (scatter abc date) ///

Page 14: Advanced  Stata  Workshop

Storing and Moving Your RecordingsGraph recordings are stored as .grec files in your “personal” folder, under the “grec” folder. Type “personal” to see where this is; normally it is C:\ado\personal. So by default Stata should store your .grec files in C:\ado\personal\grec.

your personal ado-directory is c:\ado\personal\. personal

. dir c:\ado\personal\grec\

1.3k 11/21/12 10:12 x grid.grec 0.9k 5/17/12 15:47 line..grec 0.7k 3/01/12 9:48 jeff_test_recording_graph_edits.grec 0.4k 2/21/13 9:12 advanced_workshop_1.grec

Unfortunately, if you are not faculty, you are probably using lab computers to use Stata, and when they are re-imaged, you will lose the files in your grec folder. So you can store the recordings on your flash drive by clicking the Browse button when you save your recording. Now, when you are in the graph editor and click the play button, your recording will not appear in the list because it is not stored where Stata knows to look for it. Never fear, just click Browse, and navigate to where your .grec file is. If you want your recording to be available right from code, as in play(advanced_workshop_1), you will need to move it (at least temporarily) to the “grec” folder, or write the directory location in the code: play(E:\flashdrive\Graph Recordings\advanced_workshop_1)

Page 15: Advanced  Stata  Workshop

Using Schemes in GraphingRecordings are great if you are going to be making the same kind of graph a lot. But a recording for a scatter plot will hardly affect a histogram at all, and might even make it look terrible. If you want to change the look of all graphs that you make, you may want to make a scheme. Schemes are text files which tell Stata how to draw graphs.

40

45

50

55

60

65

life

expe

ctan

cy

1900 1910 1920 1930 1940Year

4045

5055

6065

life

expe

ctan

cy

1900 1910 1920 1930 1940Year

. sysuse uslifeexp2, clear

. scatter le year. scatter le year, scheme(economist)

Page 16: Advanced  Stata  Workshop

More on Schemes

economist see help scheme_economist sj see help scheme_sj s1manual see help scheme_s1manual s1rcolor see help scheme_s1rcolor s1mono see help scheme_s1mono s1color see help scheme_s1color s2gcolor s2gmanual s2manual see help scheme_s2manual s2mono see help scheme_s2mono s2color see help scheme_s2color

Available schemes are

. graph query, schemes

Schemes are very powerful, because they let your implement a certain look without specifying a long series of options in every graph, or running every graph through the graph editor. However, creating schemes is fairly time consuming.

For more on creating your own schemes, see:

http://www3.eeg.uminho.pt/economia/nipe/2010_Stata_UGM/papers/Rising.pdfAnd http://www.ats.ucla.edu/stat/stata/seminars/stata_graph/graphsem.txt

Page 17: Advanced  Stata  Workshop

Manipulating Graphs: Memory vs. DiskWhen you draw a graph, it is stored in memory, under the name Graph.

If you draw another graph, it replaces the previous one in memory, and is now called Graph.

If you want to have multiple graphs up at the same time, you can use the name option.

graph save moves your graph from memory to disk, saving it as a .gph file.

graph dir lists all graphs in memory and on disk (in the current directory)

graph drop drops a graph from memory. Graphs contain the data files they represent, so if the dataset is large, they can actually take up quite a bit of memory.

. sysuse auto, clear

. scatter price mpg

. scatter price length

. scatter price mpg, name(scatter1)

. cd C:\Users\nickj22\Downloads\

. graph save scatter1 mygraph1.gph

Graph scatter1 mygraph1.gph. graph dir

. graph drop scatter1

Page 18: Advanced  Stata  Workshop

Manipulating Graphs: DemoGraph manipulation commands are quite useful for exploratory analysis.

See do file for code.

Page 19: Advanced  Stata  Workshop

More Example GraphsNote: Annotated code is in the do file for all of these

Histogram, with overlaid normal distribution

22 22 2233

17

50

33

38

1325

613

6 8

3831

158

020

4060

020

4060

9.5 10 10.5 11 9.5 10 10.5 11

NE N Cntrl

South West

Avg. education level Avg. education level

Avg. education level Avg. education level

Percentnormal educPercent

Per

cent

average education level

Graphs by Census region

8

6

2

8

12

16

20

12

6

12

05

1015

20P

erce

nt

9.5 10 10.5 11average education level

Source: US Census, 1980 and 1990

Avg. education level

Page 20: Advanced  Stata  Workshop

More Example Graphs

73.3

27.9

73.5

21.7

81.0

46.1

72.1

46.2

020

4060

80D

egre

es F

ahre

nhei

t

N.E. N. Central South West

Source: U.S. Census Bureau, U.S. Dept. of Commerce

by regions of the United StatesAverage July and January temperatures

July January

Use graph bar to make bar graphs

Page 21: Advanced  Stata  Workshop

More Example Graphs

5560

6570

7580

Life

exp

ecta

ncy

at b

irth

0.1.2.3Fraction

5560

6570

7580

Life

exp

ecta

ncy

at b

irth

2.5 3 3.5 4 4.5loggnp

0.0

5.1

.15

.2Fr

actio

n

2.5 3 3.5 4 4.5loggnp

Source: 1998 data from The World Bank Group

Life expectancy at birth vs. GNP per capita

Use graph combine to combine 3 graphs into one:

Page 22: Advanced  Stata  Workshop

More Example GraphsGraph matrix is a great alternative to a correlation matrix to investigate relationships between variables

Avg.annual %

growth

Lifeexpectancy

at birth

Log GNPper

capita

safewater

-10123

-1 0 1 2 3

50

60

70

80

50 60 70 80

6

8

10

12

6 8 10 1220406080

100

20 40 60 80 100

Source: The World Bank Group

Correlations among 1998 life-expectancy data

Page 23: Advanced  Stata  Workshop

More Example GraphsGet data labels (called marker labels in Stata) from the values of another variable

Canada

Dominican Republic

El Salvador

Guatemala

Haiti

Honduras

Jamaica

Mexico

Nicaragua

PanamaTrinidad

United States

Argentina

Bolivia

Brazil

Chile

ColombiaEcuador ParaPeru

UruguayVenezuela

5560

6570

7580

Life

exp

ecta

ncy

at b

irth

(yea

rs)

.5 5 10 15 20 25 30GNP per capita (thousands of dollars)

Data source: World Bank, 1998

North, Central, and South AmericaLife expectancy vs. GNP per capita

Page 24: Advanced  Stata  Workshop

More Example GraphsXtline from a panel data set can overlay lines for each value of panel variable.

3500

4000

4500

5000

Cal

orie

s co

nsum

ed

01jan2002 01apr2002 01jul2002 01oct2002 01jan2003Date

Tess SamArnold

Jan 1 2002 - Jan 1 2003Calories Consumed by Subject

Page 25: Advanced  Stata  Workshop

Macros

• Macros come in two general types:1. Globals

• Exist until Stata is closed

2. Locals • Exist until the end of the do file

• Other types of macros exist, but are rarely used

Page 26: Advanced  Stata  Workshop

global vs. local

Ballav Nick ChongMing Joe David. di "$names"

. di "`names2'"

end of do-file.

Ballav Nick ChongMing Joe David. di "$names"

Jake Steven Jose Tyrell Martin. di "`names2'"

. local names2 "Jake Steven Jose Tyrell Martin"

. global names "Ballav Nick ChongMing Joe David"

Creating the global

Creating the local- References to locals have to be enclosed in single quotes - References to globals have to begin with a $End of the do file

The local no longer exists

Conversely, the global still exists

Page 27: Advanced  Stata  Workshop

When do we need “for” loops?

• If a STATA program involves repetitive actions on a group of variables, files, or other items

• Examples • Creating new variables • Recoding missing values on a list of variables • Merging multiple datasets • Labeling variables

Page 28: Advanced  Stata  Workshop

Determining what macros already exist

_names2: Jake Steven Jose Tyrell MartinS_MACH: PC (64-bit x86-64)S_OSDTL: 64-bitS_OS: WindowsS_FLAVOR: IntercooledS_StataSE: SES_ADO: UPDATES;BASE;SITE;.;PERSONAL;PLUS;OLDPLACEF8: useF7: saveF2: describe;F1: help advice;S_level: 95names: Ballav Nick ChongMing Joe David. macro list

The local we created

The global we created

General macros automatically created by Stata

Page 29: Advanced  Stata  Workshop

Foreach

• Syntax of foreach command– foreach lname {in|of varilist} variables {commands referring to `lname'

}• The open brace must appear on the same line as the

foreach;• Nothing may follow the open brace except, of course,

comments; the first command to be executed must appear on a new line;

• The close brace must appear on a line by itself

Page 30: Advanced  Stata  Workshop

• Differences in Using -in- option and -of varlist- option in the -foreach- command– foreach i in variable1-variable5 { Stata commands

} – There is only one variable called “variable1-variable5”

– foreach i of varlist variable1-variable5 { Stata commands} – There are five variables, including variable1 through variable5

Page 31: Advanced  Stata  Workshop

Stata commands in recoding variables

Without using the foreach command

foreach command with "in" option

foreach command with "of varlist" option

recode v1 (99 = .)

foreach x in v1 v2 v3 v4 {recode `x' (99 = .)}

foreach x of varlist v1-v4 {recode `x' (99 = .)}

recode v2 (99 = .)

recode v3 (99 = .)

recode v4 (99 = .)

Page 32: Advanced  Stata  Workshop

Using macros to store variable names

_cons 2.332141 1.466546 1.59 0.115 -.5837454 5.248027 anxiety .5563893 .0831225 6.69 0.000 .3911195 .7216592 weight -.0188543 .0229981 -0.82 0.415 -.0645807 .0268721 gender -.240945 .1438568 -1.67 0.098 -.526971 .0450809 iq -.0093197 .0130535 -0.71 0.477 -.0352736 .0166341 age -.0193698 .0137039 -1.41 0.161 -.0466168 .0078771 depress Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 36.7252747 90 .408058608 Root MSE = .51549 Adj R-squared = 0.3488 Residual 22.587052 85 .265730024 R-squared = 0.3850 Model 14.1382227 5 2.82764454 Prob > F = 0.0000 F( 5, 85) = 10.64 Source SS df MS Number of obs = 91

. reg depress $ind_vars

. global ind_vars = "age iq gender weight anxiety" Global for ind. vars

Page 33: Advanced  Stata  Workshop

_cons 4.336996 1.192267 3.64 0.000 1.964759 6.709233 satlife -.4784158 .1009435 -4.74 0.000 -.6792618 -.2775698 sleep -.6100973 .1435988 -4.25 0.000 -.8958139 -.3243807 anxiety .3652071 .074345 4.91 0.000 .2172839 .5131304 weight -.017562 .0181686 -0.97 0.337 -.0537117 .0185878 gender -.0288896 .1233419 -0.23 0.815 -.2743013 .216522 iq -.0212878 .0103962 -2.05 0.044 -.041973 -.0006025 age -.0252532 .0109484 -2.31 0.024 -.0470371 -.0034692 depress Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 35.5955056 88 .404494382 Root MSE = .40557 Adj R-squared = 0.5934 Residual 13.3233014 81 .164485202 R-squared = 0.6257 Model 22.2722042 7 3.18174346 Prob > F = 0.0000 F( 7, 81) = 19.34 Source SS df MS Number of obs = 89

. reg depress $ind_vars sleep satlife

Global for ind. vars

Page 34: Advanced  Stata  Workshop

Running Parallel lists with macros

pig says oinkoinkcow says moodog says woofcat says meow 5. } 4. di "`a' says `b'" 3. local b : word `i' of `2' 2. local a : word `i' of `1'. forvalues i = 1/`n' {.

. local n : word count `1'

. local 2 "meow woof moo oinkoink"

. local 1 "cat dog cow pig"

. **** running parrallel lists ****

Create a local called “1”

Create macro 3 = # of words in macro 1

Create local called “2”

Extracting word `I’ from local “1” Extracting word `I’ from local “2”

Using the new locals in a display command with other text

Results

Page 35: Advanced  Stata  Workshop

Creating a program in Stata

3. end 2. list make price mpg foreign 1. display "Listing the values of four variables". program printit

Command name Program name First command to be run when the program is implemented

Second command to be run when the program is implemented Telling Stata that

there are no more commands to be used as part of the program

Page 36: Advanced  Stata  Workshop

74. Volvo 260 11,995 17 Foreign 73. VW Scirocco 6,850 25 Foreign 72. VW Rabbit 4,697 25 Foreign 71. VW Diesel 5,397 41 Foreign 70. VW Dasher 7,140 23 Foreign 69. Toyota Corona 5,719 18 Foreign 68. Toyota Corolla 3,748 31 Foreign 67. Toyota Celica 5,899 18 Foreign 66. Subaru 3,798 35 Foreign 65. Renault Le Car 3,895 26 Foreign 64. Peugeot 604 12,990 14 Foreign 63. Mazda GLC 3,995 30 Foreign 62. Honda Civic 4,499 28 Foreign 61. Honda Accord 5,799 25 Foreign 60. Fiat Strada 4,296 21 Foreign 59. Datsun 810 8,129 21 Foreign 58. Datsun 510 5,079 24 Foreign 57. Datsun 210 4,589 35 Foreign 56. Datsun 200 6,229 23 Foreign 55. BMW 320i 9,735 25 Foreign 54. Audi Fox 6,295 23 Foreign 53. Audi 5000 9,690 17 Foreign 52. Pont. Sunbird 4,172 24 Domestic 51. Pont. Phoenix 4,424 19 Domestic 50. Pont. Le Mans 4,723 19 Domestic 49. Pont. Grand Prix 5,222 19 Domestic 48. Pont. Firebird 4,934 18 Domestic 47. Pont. Catalina 5,798 18 Domestic 46. Plym. Volare 4,060 18 Domestic 45. Plym. Sapporo 6,486 26 Domestic 44. Plym. Horizon 4,482 25 Domestic 43. Plym. Champ 4,425 34 Domestic 42. Plym. Arrow 4,647 28 Domestic 41. Olds Toronado 10,371 16 Domestic 40. Olds Starfire 4,195 24 Domestic 39. Olds Omega 4,181 19 Domestic 38. Olds Delta 88 4,890 18 Domestic 37. Olds Cutlass 4,733 19 Domestic 36. Olds Cutl Supr 5,172 19 Domestic 35. Olds 98 8,814 21 Domestic 34. Merc. Zephyr 3,291 20 Domestic 33. Merc. XR-7 6,303 14 Domestic 32. Merc. Monarch 4,516 18 Domestic 31. Merc. Marquis 6,165 15 Domestic 30. Merc. Cougar 5,379 14 Domestic 29. Merc. Bobcat 3,829 22 Domestic 28. Linc. Versailles 13,466 14 Domestic 27. Linc. Mark V 13,594 12 Domestic 26. Linc. Continental 11,497 12 Domestic 25. Ford Mustang 4,187 21 Domestic 24. Ford Fiesta 4,389 28 Domestic 23. Dodge St. Regis 6,342 17 Domestic 22. Dodge Magnum 5,886 16 Domestic 21. Dodge Diplomat 4,010 18 Domestic 20. Dodge Colt 3,984 30 Domestic 19. Chev. Nova 3,955 19 Domestic 18. Chev. Monza 3,667 24 Domestic 17. Chev. Monte Carlo 5,104 22 Domestic 16. Chev. Malibu 4,504 22 Domestic 15. Chev. Impala 5,705 16 Domestic 14. Chev. Chevette 3,299 29 Domestic 13. Cad. Seville 15,906 21 Domestic 12. Cad. Eldorado 14,500 14 Domestic 11. Cad. Deville 11,385 14 Domestic 10. Buick Skylark 4,082 19 Domestic 9. Buick Riviera 10,372 16 Domestic 8. Buick Regal 5,189 20 Domestic 7. Buick Opel 4,453 26 Domestic 6. Buick LeSabre 5,788 18 Domestic 5. Buick Electra 7,827 15 Domestic 4. Buick Century 4,816 20 Domestic 3. AMC Spirit 3,799 22 Domestic 2. AMC Pacer 4,749 17 Domestic 1. AMC Concord 4,099 22 Domestic make price mpg foreign

Listing the values of four variables. printit.

Invoke the program by simply typing the program name and then running in Stata.

Results

Page 37: Advanced  Stata  Workshop

SVYset and SVY Prefix

Page 38: Advanced  Stata  Workshop

Simple vs. ComplexSample

• Many Statistical techniques assume simple random sample

• Simple random sample—each element of the sample has equal probability of being sampled.

Page 39: Advanced  Stata  Workshop

Complex Survey

• Sampling weights– inverse probability of being sampled– represent weight elements in the population

• Clustering– groups sampled together – primary sampling units (PSU) -- first level clusters

• Stratification– groups of clusters– strata– strata sampled separately

Page 40: Advanced  Stata  Workshop

Example

• States, Counties, Schools, Students sample states in different regions sample counties within each state sample schools within each county sample students from schools

Page 41: Advanced  Stata  Workshop

svyset

• svyset psu? [pweight=?] , strata = (?) fpc(?) || psu?, fpc(?)

psu = primary sampling unitpweight = probability weightfpc = finite population correction (total # of stratus or clusters PSU is sampled from)

|| = next stage

Page 42: Advanced  Stata  Workshop

SVYSET Examples• use http://www.stata-press.com/data/r12/multistage• svyset county [pw=sampwgt], strata(state) fpc(ncounties) || school, fpc(nschools)• save highschool• use highschool• svyset

Page 43: Advanced  Stata  Workshop

SVY Prefix Examples

• svy: proportion race sex• svy: tab race sex, ci • svy: tab race sex, count ci • svy, subpop(if sex==1): mean weight height • svy, subpop(if sex==2): mean weight height,

over (race)• svy: reg weight sexNote: subpop is preferred over “if statement” as stata will include all cases for estimating standard errors

Page 44: Advanced  Stata  Workshop

Take-home Message

• Ask what sampling design for your data before running analysis.

• If complex survey data, consider svyset or multilevel modeling.

Page 45: Advanced  Stata  Workshop

xtset and xtprefix

Page 46: Advanced  Stata  Workshop

xtset—Declare Panel Data

• xtset panelvar specify unit observed repeatedly • xtset panelvar timevar [, tsoptions] specify time var • xtset display current xtset

• xtset, clear clear xtset

MenuStatistics > Longitudinal/panel data > Setup and utilities > Declare dataset to be panel data

Page 47: Advanced  Stata  Workshop

Time-Unit Options

• [unitoptions] specify units of time

clocktime, daily, weekly, monthly, quarterly, halfyearly, yearly…

• [deltaoption] specify duration between observations

delta (#) e.g. deta (2)

delta (exp) delta (7*24)

delta (# units) delta (10 min)/(7 days)

Page 48: Advanced  Stata  Workshop

Xtdescribe—pattern of xt data

• xtdescribe [if] [in] [, options] [,opti ons] patterns(#) e.g. p(10) -- display max. 10

width(#) w(80) -- display 80 columns

MenuStatistics > Longitudinal/panel data > Setup and utilities > Describe pattern of xt data

Page 49: Advanced  Stata  Workshop

Examples

• use http://www.stata-press.com/data/r12/nlswork• xtset • Browse• xtdes, p(20)• xtsum hours• xttab race• xtreg ln_w grade age ttl_exp tenure south, mle

Page 50: Advanced  Stata  Workshop

Post Estimation in STATA

Page 51: Advanced  Stata  Workshop

Generating variables with fitted values

• After a regression, use the “predict newvar” syntax to create a new variable, that contains the fitted values for each observation.• If the model is fitted only for a limited sample, use the following syntax to get the predicted value for that sample

Page 52: Advanced  Stata  Workshop

Generating variables with residuals

• After a regression, use the “predict newvar, r ” syntax to create a new variable, that contains the residuals for each observation.

Page 53: Advanced  Stata  Workshop

Reformat and write regression tables to document files

• ‘Outreg’ command can be used to reformat and write regression tables to document files• Example

• Outreg has lots of options that lets us customize the look of the output table.

Page 54: Advanced  Stata  Workshop

Margins

• Margins can be useful to understand regression results• Example –

• In the above regression, the coefficient on weight is misleading as an increase in weight affects both weight and weight squared. So, the total effect depends on the starting value of weight.• The following command will set the variables to their means and find the derivative of expected price with respect to weight at that point.

Page 55: Advanced  Stata  Workshop

Marginsplot

• Often, the results from margins can be hard to read as in the following example.• The command ‘marginsplot’ can be used to visualize the results and understand them better.• Example

0.0

2.0

4.0

6.0

8E

ffect

s on

Pr(

Dia

bete

s)

20 30 40 50 60 70age in years

1.black 1.female

Average Marginal Effects

Page 56: Advanced  Stata  Workshop

Using saved results

• Stata stores results from a command in various forms – scalar, string, matrices etc. Such results are called returned results

• Returned results can be used to make other computations in STATA• We can type ‘return list’ after we run a command to see what the returned results• Example –

• We can use the returned results as variables and perform computations• Example – gen range = r(max) – r(min)

Page 57: Advanced  Stata  Workshop

Using saved results contd…

• Results are stored mainly as r() class or e() class depending on the commands used• Access r() class results – return list, access e() class results – ereturn list• Matrices in returned results can be used as regular matrices. • Example :

• More advanced computations with matrices can be done in MATA which is a matrix language built into STATA.

Page 58: Advanced  Stata  Workshop

Post estimation statistics

• estat ic• Available only after commands that report log likelihood• Given two models, the one with the smaller AIC and BIC values fits the data better

• estat vce - displays the covariance matrix estimates

Page 59: Advanced  Stata  Workshop

Postfile

• Results can be stored into a STATA dataset using the ‘postfile’ command• This can be useful when we have to run a lot of regressions, for example - monte carlo simulations.• Lets consider an example from the STATA manual –

Suppose we want the means and variances from 10,000 randomly constructed 100-observation samples of data and store the results in results.dtaWe could do that as follows (refer to the do file)