a brief introduction to stata(3)
DESCRIPTION
A Brief Introduction to Stata(3). 3. Working with data files: changing dataset. 3.1. Generating new variables 3.2. Labeling 3.3. Keeping and Dropping Variables and Observations 3.4. Producing Graphs 3.5. Combining Data Sets. 3.1. Generating new variables. - PowerPoint PPT PresentationTRANSCRIPT
A Brief Introduction to Stata(3)
3. Working with data 3. Working with data files: files: changing datasetchanging dataset
3.1. Generating new variables 3.2. Labeling 3.3. Keeping and Dropping Variables and Ob
servations 3.4. Producing Graphs 3.5. Combining Data Sets
3.1. Generating new variables the command generate (abbreviated gen) crea
tes new variables, while the command replace changes the values of an existing variable:
. gen oldhead=1 if age>32 (4438 missing values generated) . replace oldhead=0 if age<=32 (4438 real changes made)
The following points should be made: If a generate or replace command is issued
without any conditions, that command applies to all observations in the data file.
While using the generate command, care should be taken to handle missing values properly.
The right hand side of the = sign in the generate or replace commands can be any expression involving variable names, not just a value.
The command replace does not have to always follow the generate command. The replace command can be used to change the values of any existing variable, independently of generate command.
Calculates the maximum between the food and non-food expenditures for each household:
. gen maxexp=max(food,nfood) . egen maxexp=rmax(food nfood)
The more powerful feature of egen command is its ability to create statistics involving multiple observations.
. egen avgtotex=mean(totex) . egen avgtea=mean(totex), by(regn)
3.2. Labeling 3.2.1. Labeling variables
. label variable oldhead "HH Head is over 32" . label var oldhead "HH Head is over 32"
to see the new label, type: . des oldhead
3.2.2. Labeling Data To attach a label to the entire data set : . label data “FIES 2000” To see this label : . des
3.2.3. Labeling Values of variables .gen majisland=1 if regn<=5 | regn==13 | regn=
=14 .replace majisland=2 if regn>5 & regn<=8 .replace majisland=3 if majisland==. . tab majisland . label define majlabel 3 "Mind" 2 "Vis" 1 "L
uz" . label values majisland majlabel . tab majisland . tab majisland, nolabel
3.3. Keeping and Dropping Variables and Observations keep var1 var2 var3 (or keep var1-var3) drop var4 var5 var6 (or drop var4-var6) . drop if age>=80 . keep if fsize<=6 . drop in 1/20
You cannot include a variable list in a drop or keep command :
. keep hcn fsize if fsize<=6 invalid syntax r(198); You have use two commands to do the job: . keep if fsize<=6 . keep hhcode fsize
3.4. Producing Graphs shows the distribution of the age of the
household head in a bar graph: . histogram age The number of bars may be increased, up
to a maximum of 50, by adding the option bin(#).
. histogram age, bin(12)
scatter toinc age, t1(total income by age) saving(incage, replace) s(.)
3.5. Combining Data Sets 3.5.1. Appending Data Sets . use popproj1 . sort year . save popproj1, replace
. use popproj2, clear . sort year . save popproj2, replace
. use popproj1 . append using popproj2 . save, replace
3.5.2. Merging Data Sets . use popproj1, clear . sort year . save, replace
use mmlaproj, clear . sort year . save, replace . merge year using popproj1 .save,replace
. tab _merge _merge==1 obs. from master data
_merge==2 obs. from only one using dataset
_merge==3 obs. from at least two datasets,
master or using . keep if _merge==3
Review . gen oldhead=1 if age>32 . replace oldhead=0 if age<=32 . gen maxexp=max(food,nfood) . egen maxexp=rmax(food nfood) . egen avgtotex=mean(totex) . egen avgtea=mean(totex), by(regn)
. label variable oldhead "HH Head is over 32"
. des oldhead
. label data “FIES 2000”
. des
. label define majlabel 3 "Mind" 2 "Vis" 1 "Luz"
. label values majisland majlabel . tab majisland
keep var1 var2 var3 (or keep var1-var3) drop var4 var5 var6 (or drop var4-var6) . drop if age>=80 . keep if fsize<=6 . drop in 1/20
. histogram age . histogram age, bin(12)
. scatter toinc age, t1(total income by age) saving(incage, replace) s(.)
. merge .append
4. Working with .log and .do 4. Working with .log and .do filesfiles
4.1. Keeping track of work 4.2. Batch processing
4.1. Keeping track of work .log using "c:\intropov\logfiles\log1.log” . log close The default extension name is "SMCL" to stan
d for a formatted log file. We can change these default to an ordinary "LOG" file, and say providing the name log1 in some appropriate folder, such as c:\intropov\logfiles. All commands issued in between plus corresponding outputs are saved in the .log file.
The main advantage of using .do file instead of typing commands line by line is repeatability. Usually if it takes quite some steps to obtain the desired output, you should edit a .do file because you may need to do it tens of times.
4.2. Batch processing dofile.doc