session i how to use stata & basic data management commands
Post on 28-Dec-2015
253 Views
Preview:
TRANSCRIPT
Session ISession I
How to use STATAHow to use STATA& &
Basic Data Management Basic Data Management CommandsCommands
What will be covered?What will be covered?
Introduction to STATA SoftwareIntroduction to STATA Software General Guidelines in Data entryGeneral Guidelines in Data entry Data Management in STATAData Management in STATA
Introduction to STATAIntroduction to STATA
Open & Close the Output Open & Close the Output FileFile
To open the log filelog using “directory\path\filename.log”log using d:\trials\zinc.log
To closelog close
zinc.dta
To Open Log (Output) File
To Close the Log File
Append & Replace the Existing Log File
To append the existing log file
log using d:\trials\zinc.log, append
To replace the existing log file
log using d:\trials\zinc.log, replace
Open the Data FileOpen the Data File
To open the data file
use “directory\path\filename.dta”
use d:\trials\zinc.dta
To save
save zinc.dta
zinc.dta
To Make A New Directory
To Change the Directory
General Guidelines in Data Entry
Rows in the datasheet should contain individual Rows in the datasheet should contain individual information - Record.information - Record.
Each column should contain values of a single entity of Each column should contain values of a single entity of all the individuals – Variable.all the individuals – Variable.
Variable name should not exceed more than eight Variable name should not exceed more than eight characters.characters.
Variables can be either numeric or string or Variables can be either numeric or string or alphanumeric. alphanumeric.
A numeric variable must posses only numbers.A numeric variable must posses only numbers. In any datasheet, identification number is must.In any datasheet, identification number is must.
DATA DESCRIPTIONDATA DESCRIPTION
Data Management using Data Management using STATASTATA
Inputting DataInputting Data Editing DataEditing Data Creating and Changing VariablesCreating and Changing Variables Saving and Reusing DataSaving and Reusing Data Data ReorganizationData Reorganization Merging and Appending datasetsMerging and Appending datasets
Data Management using Data Management using STATASTATA
Inputting DataInputting Data
Enter data from keyboard– input varlist– input str25 name age str1 sex– Best way is copy from excel and directly paste the
data to STATA editor– Transfer from other programs
Arithmetic Operators
+ (Addition) - (Subtraction)* (Multiplication)/ (Division)^ (Raise to power)
Relational Operators
> (greater than)< (less than)> = (greater than or equal)< = (less than or equal)= = (equal)!= (not equal)
Logical Operators
& (and)| (or)!= (not equal)
Expressions
If – used when expression is to be specified with the condition
In – used when range is to be specified in the condition
Editing DataEditing Data Edit using Data Editor
– edit [varlist] [if] [in]– edit treatment centre age– edit treatment age if centre==3&age>25
Browsing Data Browsing Data List using Data Editor
– browse [varlist] [if] [in]– browse treatment centre age– browse treatment age if centre==3&age>25
Do this Exercise…Do this Exercise…
Edit the following:Edit the following:– pcode, treatment and cough only for pcode, treatment and cough only for
centre 4centre 4– browse for the same and feel the browse for the same and feel the
differencedifference
zinc.dta
Creating & Changing Creating & Changing VariablesVariables
Create new variableCreate new variable– gengenerate newvar = exp [if] [in]erate newvar = exp [if] [in]– gen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wtgen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wt
Do this Exercise……
Generate total stool output from 0-48 hoursGenerate total stool output from 0-48 hours
zinc.dta
Creating & Changing Creating & Changing VariablesVariables
…contd…contd Change contents of existing variable– To replace
replace oldvar =exp [if] [in] replace sodium1 = . if sodium1==0
– To recode recode varlist (erule) [(erule) ...] [if] [in] recode age min/6=1 7/11=2 12/max=3 , gen(agecat)
RuleRule ExampleExample MeaningMeaning
# = ## = #
# # = ## # = #
#/# = ##/# = #
nonmissing = #nonmissing = #
missing = #missing = #
3 = 13 = 1
2 4 = 52 4 = 5
4/8 = 34/8 = 3
nonmissing = 2nonmissing = 2
missing = 9missing = 9
3 recoded to 13 recoded to 1
2 and 4 recoded to 52 and 4 recoded to 5
4 through 8 recoded to 34 through 8 recoded to 3
all other nonmissing to 2all other nonmissing to 2
all other missing to 9all other missing to 9
Do this Exercise……
Ex 1: Replace all zeros in serum Potassium as missing.Replace all zeros in serum Potassium as missing.
Ex 2: Recode pre admission diarrhea duration into 0-24h, Ex 2: Recode pre admission diarrhea duration into 0-24h, 25-72h and > 72h25-72h and > 72h
zinc.dta
Rename the existing variable– rename oldvarname newvarname– ren tlc_t2 tlc2– ren tlc_t3 tlc3
Eliminate the existing variable– To drop
drop varlist drop name address
– To keep keep varlist keep idno age sodium albumin-tlc
Creating & Changing Creating & Changing VariablesVariables
…contd…contd
Creating & Changing Creating & Changing VariablesVariables
…contd…contd
zinc.dta
Saving & Reusing Data in Stata Format
To Save data– save filename.dta– save zinc, replace– clear
To reuse data– use filename– use zinc
zinc.dta
Data ReorganizationData Reorganization Sorting observations and changing variable
order– To sort
sort varlist [in] {ascending} sort pcode
– Move specified variables to front of dataset order varlist
– Move one variable to specified position move varname1 varname2
– Alphabetize specified variables and move to front of dataset aorder [varlist] zinc.dta
Data Reorganization Data Reorganization ……contdcontd
Convert data from wide to long– reshape long stubnames, i(varlist) j(varname)– reshape long albumin, i(pcode) j(time)
Wide Shape Data Long Shape Data
Data Reorganization Data Reorganization ……contdcontd
Convert data from long to wide– reshape wide stubnames, i(varlist) j(varname)– reshape wide albumin, i(pcode) j(time)
Long Shape Data Wide Shape Data
Do this Exercise…
Convert serum zinc from wide to long shape data using zinclab.dta
zinclab.dta
Answer!!!
zinclab.dta
Merging & Appending Datasets
To append datasets– append using filename
use zinc1.dta append using zinc2.dta
To merge datasets– merge [varlist] using filename
use zinclab sort pcode save zinclab, replace use zincprognostic sort pcode merge pcode using zinclab zinclab.dta
Merge file 1 (zinclab.dta) with file 2 (zincprognosis.dta)
Do this Exercise…
zinclab.dta
Session IISession II
Data Cleaning Data Cleaning & &
Preparing Data for Preparing Data for AnalysisAnalysis
Preparing Data for Preparing Data for AnalysisAnalysis
Inclusion criteria ≤ 35 months old children
Preparing Data for Analysis Preparing Data for Analysis ……contdcontd
Do this Exercise…Do this Exercise…Inclusion criteria for the study was pre admission diarrhea duration < 7 days
Ex 1: Convert pre admission diarrhea duration from hours to days using zincclean.dta
Ex 2: Find values beyond expected range
zinc.dta
Answer!!!
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
Do this Exercise…
Do similar exercise for hemoglobin using zinc.dta
zinc.dta
Answer!!!
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
What do you mean by 1 & 2???
zinc.dta
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
Label name
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
What is wrong and how to correct it??? zinc.dta
Preparing Data for Analysis …contdPreparing Data for Analysis …contd
Preparing Data for Analysis …contdPreparing Data for Analysis …contd
Generate total stool output for first 48 hrs
Do this Exercise…
zinclean.dta
Preparing Data for Analysis Preparing Data for Analysis …contd…contd
Draw a boxplot and identify extreme value, if any, for s2_tstool_wt using zincclean.dta
Do this Exercise…
zincclean.dta
Session III Session III
Introduction to Introduction to Basic Data AnalysisBasic Data Analysis
What will be Covered?What will be Covered?
Descriptive Statistics Parametric tests Non-parametric tests
AnalysesAnalyses
Univariate (one variable at a time)Univariate (one variable at a time) Bivariate (two variables at a time)Bivariate (two variables at a time) Multivariate (more than two variables Multivariate (more than two variables
at a time)at a time)
Descriptive Statistics
Univariate AnalysisUnivariate Analysis
Quantitative
MeanMedianRange/IQ RangeSD
CategoricalCategorical
FrequencyFrequencypercentagepercentage
Descriptive Statistics-Categorical Variable
Can we label the
variables???
Contingency Table
Contingency Table …contd
Contingency Table …contd
Contingency Table …contd
Contingency Table …contd
Immediate commands
Ex 1: Draw a crosstab between treatment and withdrawn using zinc.dta
Ex 2: Draw a crosstab between treatment and diarr24, diarr48
Do this Exercise…
zinc.dta
Descriptive Statistics-Quantitative Variable
Summary in Detail
Calculate summary statistics for the following variables:
1. Total stool output 0-48h2. Total ORS intake 0-24h3. Total stool frequency in 24h before admission4. Serum zinc at admission
Do this Exercise…
zinc.dta
Summary Statistics by Group
Calculate summary statistics by “treament” for the following variables:
1. Total stool output 0-48h2. Total ORS intake 0-24h3. Total stool frequency in 24h before admission4. Serum zinc at admission
Do this Exercise…
zinc.dta
Percentile Values
Calculate 3rd and 97th percentile value by “treatment” for the following variables:
1. Total stool output 0-48h2. Total ORS intake 0-24h
Do this Exercise…
zinc.dta
Session IV (A)
Bi-variate Analyses
Analysis of Clinical Trial Data
1. Compare patient characteristics at the time of randomization and baseline measurements between the groups
2. Assess the difference in outcome variable(s) between the groups (adjusting for any imbalance in patient characteristics or baseline outcome variables)
Analysis of Clinical Trial Data
1. Categorical vs Categorical
2. Categorical vs Quantitative
Bi-variate AnalysesBi-variate AnalysesBi-variate AnalysesBi-variate Analyses
1. Categorical Vs 1. Categorical Vs CategoricalCategorical
Unrelated Related
-Chi square test McNemar test
- Fishers Exact test
X=2, Y=2 X>2, Y>2
Unrelated
- Chi square test
- Fishers Exact test
X :Group variable
Y :Outcome variable
Chi-square test
Is there a difference between the proportion of patients requiring IV fluids in the two treatment groups?
Do this Exercise…
zinc.dta
Chi-square Test/Fisher’s exact Test by Group
Comparison of two proportions
1. Is there a difference in the proportion of patients recovered in rota virus negativity between the two treatment groups?
2. 91% of patients recovered in treatment A (n=248) and 95% of patients recovered in treatment B (n=252). Test these proportions and find out the p-value
Do this Exercise…
zinc.dta
McNemar’s Chi-square Test
McNemar’s Chi-square Test …contd
<
<
Is there a shift in zinc deficiency from baseline after giving treatment B?
Do this Exercise…
zinc.dta
2. Categorical vs 2. Categorical vs QuantitativeQuantitative
X=2 & Y: Normal
Unrelated Related
Student’s t test Paired ‘t’ test
X=2 & Y: Non Normal
Unrelated Related
Wilcoxon ranksum Wilcoxon signrank
X>2 & Y: Non-NormalX> 2 & Y: Normal
Unrelated Related
One way Repeated
ANOVA measures ANOVA
Unrelated Related
Kruskal Wallis Freidmans test
Parametric Non-Parametric
Student’s ‘t’ Test for Independent Groups
Student’s ‘t’ Test for Independent Groups …
contd
What is the Difference in the Total ORS Intake in the First 24h between the Two
Groups?
Transformations
Transformations …contd
Ex 1: What is the difference in total stool output 0-48hours between the two groups?
Ex 2: Is there a difference between total duration of diarrhea (in hours) (varname: tot_du_dia_h) between the two treatment groups?
Do this Exercise…
zinc.dta
Geometric Mean if Log Transformation is Used
Do this Exercise
Ex: Calculate the geometric mean for stool output
0-48 hours
zinc.dta
Paired t-Test
Do this Exercise…
Is there a change in zinc value from baseline after giving treatment B?
zinc.dta
Is there a Change in the Serum Zinc from Is there a Change in the Serum Zinc from Baseline to Recovery between Two Treatment Baseline to Recovery between Two Treatment
Groups?Groups?
Discuss………..
One-way ANOVA*
* Analysis of Variance
Multiple ComparisonsDifference in means of zinc values between age group
of ≤6 & > 12
P-value
Non-Parametric Methods
Is there a difference in total stool output in the first 24h between the two treatment groups?
Answer: Wilcoxon Ranksum test
Is there a difference in total stool output in the first 24h between the two treatment groups? …
contd
Is there a difference in total stool output in the first 24h between the two treatment groups? …
contdAnswer: Wilcoxon Ranksum test
Do this Exercise…
Is there a difference in total diarrhea duration between the two groups?
zinc.dta
Is there a Change in zinc from baseline after giving treatment A?
Answer: Wilcoxon signed-rank testAnswer: Wilcoxon signed-rank test
Answer: Wilcoxon signed-rank testAnswer: Wilcoxon signed-rank test
Is There a Change in zinc from baseline after giving treatment A?
Is There a Change in zinc from baseline after giving treatment A?
Do this Exercise…Do this Exercise…
1. Is there any difference in zinc from baseline after giving treatment B?
zinc.dta
Is there a difference in total stool output across age groups?
…… Contd
Answer: Kruskal-Wallis Test
…… Contd
Is there a difference in total stool output across age groups?
Answer: Kruskal-Wallis Test
… Contd
Is there a difference in total stool output across age groups?
Answer: Kruskal-Wallis Test
…… Contd
Is there a difference in total stool output across age groups?
Answer: Kruskal-Wallis Test
Do this Exercise…Do this Exercise…
1. Is there any difference in serum zinc (at admission) across the age groups ?
zinc.dta
top related