session i how to use stata & basic data management commands

113
Session I Session I How to use STATA How to use STATA & & Basic Data Management Basic Data Management Commands Commands

Upload: berniece-byrd

Post on 28-Dec-2015

251 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Session I How to use STATA & Basic Data Management Commands

Session ISession I

How to use STATAHow to use STATA& &

Basic Data Management Basic Data Management CommandsCommands

Page 2: Session I How to use STATA & Basic Data Management Commands

What will be covered?What will be covered?

Introduction to STATA SoftwareIntroduction to STATA Software General Guidelines in Data entryGeneral Guidelines in Data entry Data Management in STATAData Management in STATA

Page 3: Session I How to use STATA & Basic Data Management Commands

Introduction to STATAIntroduction to STATA

Page 4: Session I How to use STATA & Basic Data Management Commands
Page 5: Session I How to use STATA & Basic Data Management Commands
Page 6: Session I How to use STATA & Basic Data Management Commands

Open & Close the Output Open & Close the Output FileFile

To open the log filelog using “directory\path\filename.log”log using d:\trials\zinc.log

To closelog close

zinc.dta

Page 7: Session I How to use STATA & Basic Data Management Commands

To Open Log (Output) File

Page 8: Session I How to use STATA & Basic Data Management Commands

To Close the Log File

Page 9: Session I How to use STATA & Basic Data Management Commands

Append & Replace the Existing Log File

To append the existing log file

log using d:\trials\zinc.log, append

To replace the existing log file

log using d:\trials\zinc.log, replace

Page 10: Session I How to use STATA & Basic Data Management Commands

Open the Data FileOpen the Data File

To open the data file

use “directory\path\filename.dta”

use d:\trials\zinc.dta

To save

save zinc.dta

zinc.dta

Page 11: Session I How to use STATA & Basic Data Management Commands

To Make A New Directory

Page 12: Session I How to use STATA & Basic Data Management Commands

To Change the Directory

Page 13: Session I How to use STATA & Basic Data Management Commands

General Guidelines in Data Entry

Rows in the datasheet should contain individual Rows in the datasheet should contain individual information - Record.information - Record.

Each column should contain values of a single entity of Each column should contain values of a single entity of all the individuals – Variable.all the individuals – Variable.

Variable name should not exceed more than eight Variable name should not exceed more than eight characters.characters.

Variables can be either numeric or string or Variables can be either numeric or string or alphanumeric. alphanumeric.

A numeric variable must posses only numbers.A numeric variable must posses only numbers. In any datasheet, identification number is must.In any datasheet, identification number is must.

Page 14: Session I How to use STATA & Basic Data Management Commands

DATA DESCRIPTIONDATA DESCRIPTION

Page 15: Session I How to use STATA & Basic Data Management Commands

Data Management using Data Management using STATASTATA

Page 16: Session I How to use STATA & Basic Data Management Commands

Inputting DataInputting Data Editing DataEditing Data Creating and Changing VariablesCreating and Changing Variables Saving and Reusing DataSaving and Reusing Data Data ReorganizationData Reorganization Merging and Appending datasetsMerging and Appending datasets

Data Management using Data Management using STATASTATA

Page 17: Session I How to use STATA & Basic Data Management Commands

Inputting DataInputting Data

Enter data from keyboard– input varlist– input str25 name age str1 sex– Best way is copy from excel and directly paste the

data to STATA editor– Transfer from other programs

Page 18: Session I How to use STATA & Basic Data Management Commands

Arithmetic Operators

+ (Addition) - (Subtraction)* (Multiplication)/ (Division)^ (Raise to power)

Page 19: Session I How to use STATA & Basic Data Management Commands

Relational Operators

> (greater than)< (less than)> = (greater than or equal)< = (less than or equal)= = (equal)!= (not equal)

Page 20: Session I How to use STATA & Basic Data Management Commands

Logical Operators

& (and)| (or)!= (not equal)

Page 21: Session I How to use STATA & Basic Data Management Commands

Expressions

If – used when expression is to be specified with the condition

In – used when range is to be specified in the condition

Page 22: Session I How to use STATA & Basic Data Management Commands

Editing DataEditing Data Edit using Data Editor

– edit [varlist] [if] [in]– edit treatment centre age– edit treatment age if centre==3&age>25

Page 23: Session I How to use STATA & Basic Data Management Commands

Browsing Data Browsing Data List using Data Editor

– browse [varlist] [if] [in]– browse treatment centre age– browse treatment age if centre==3&age>25

Page 24: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…Do this Exercise…

Edit the following:Edit the following:– pcode, treatment and cough only for pcode, treatment and cough only for

centre 4centre 4– browse for the same and feel the browse for the same and feel the

differencedifference

zinc.dta

Page 25: Session I How to use STATA & Basic Data Management Commands

Creating & Changing Creating & Changing VariablesVariables

Create new variableCreate new variable– gengenerate newvar = exp [if] [in]erate newvar = exp [if] [in]– gen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wtgen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wt

Page 26: Session I How to use STATA & Basic Data Management Commands

Do this Exercise……

Generate total stool output from 0-48 hoursGenerate total stool output from 0-48 hours

zinc.dta

Page 27: Session I How to use STATA & Basic Data Management Commands

Creating & Changing Creating & Changing VariablesVariables

…contd…contd Change contents of existing variable– To replace

replace oldvar =exp [if] [in] replace sodium1 = . if sodium1==0

– To recode recode varlist (erule) [(erule) ...] [if] [in] recode age min/6=1 7/11=2 12/max=3 , gen(agecat)

RuleRule ExampleExample MeaningMeaning

# = ## = #

# # = ## # = #

#/# = ##/# = #

nonmissing = #nonmissing = #

missing = #missing = #

3 = 13 = 1

2 4 = 52 4 = 5

4/8 = 34/8 = 3

nonmissing = 2nonmissing = 2

missing = 9missing = 9

3 recoded to 13 recoded to 1

2 and 4 recoded to 52 and 4 recoded to 5

4 through 8 recoded to 34 through 8 recoded to 3

all other nonmissing to 2all other nonmissing to 2

all other missing to 9all other missing to 9

Page 28: Session I How to use STATA & Basic Data Management Commands

Do this Exercise……

Ex 1: Replace all zeros in serum Potassium as missing.Replace all zeros in serum Potassium as missing.

Ex 2: Recode pre admission diarrhea duration into 0-24h, Ex 2: Recode pre admission diarrhea duration into 0-24h, 25-72h and > 72h25-72h and > 72h

zinc.dta

Page 29: Session I How to use STATA & Basic Data Management Commands

Rename the existing variable– rename oldvarname newvarname– ren tlc_t2 tlc2– ren tlc_t3 tlc3

Eliminate the existing variable– To drop

drop varlist drop name address

– To keep keep varlist keep idno age sodium albumin-tlc

Creating & Changing Creating & Changing VariablesVariables

…contd…contd

Creating & Changing Creating & Changing VariablesVariables

…contd…contd

zinc.dta

Page 30: Session I How to use STATA & Basic Data Management Commands

Saving & Reusing Data in Stata Format

To Save data– save filename.dta– save zinc, replace– clear

To reuse data– use filename– use zinc

zinc.dta

Page 31: Session I How to use STATA & Basic Data Management Commands

Data ReorganizationData Reorganization Sorting observations and changing variable

order– To sort

sort varlist [in] {ascending} sort pcode

– Move specified variables to front of dataset order varlist

– Move one variable to specified position move varname1 varname2

– Alphabetize specified variables and move to front of dataset aorder [varlist] zinc.dta

Page 32: Session I How to use STATA & Basic Data Management Commands

Data Reorganization Data Reorganization ……contdcontd

Convert data from wide to long– reshape long stubnames, i(varlist) j(varname)– reshape long albumin, i(pcode) j(time)

Wide Shape Data Long Shape Data

Page 33: Session I How to use STATA & Basic Data Management Commands

Data Reorganization Data Reorganization ……contdcontd

Convert data from long to wide– reshape wide stubnames, i(varlist) j(varname)– reshape wide albumin, i(pcode) j(time)

Long Shape Data Wide Shape Data

Page 34: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…

Convert serum zinc from wide to long shape data using zinclab.dta

zinclab.dta

Page 35: Session I How to use STATA & Basic Data Management Commands

Answer!!!

zinclab.dta

Page 36: Session I How to use STATA & Basic Data Management Commands

Merging & Appending Datasets

To append datasets– append using filename

use zinc1.dta append using zinc2.dta

To merge datasets– merge [varlist] using filename

use zinclab sort pcode save zinclab, replace use zincprognostic sort pcode merge pcode using zinclab zinclab.dta

Page 37: Session I How to use STATA & Basic Data Management Commands

Merge file 1 (zinclab.dta) with file 2 (zincprognosis.dta)

Do this Exercise…

zinclab.dta

Page 38: Session I How to use STATA & Basic Data Management Commands

Session IISession II

Data Cleaning Data Cleaning & &

Preparing Data for Preparing Data for AnalysisAnalysis

Page 39: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Preparing Data for AnalysisAnalysis

Inclusion criteria ≤ 35 months old children

Page 40: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis ……contdcontd

Page 41: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…Do this Exercise…Inclusion criteria for the study was pre admission diarrhea duration < 7 days

Ex 1: Convert pre admission diarrhea duration from hours to days using zincclean.dta

Ex 2: Find values beyond expected range

zinc.dta

Page 42: Session I How to use STATA & Basic Data Management Commands

Answer!!!

Page 43: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Page 44: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Page 45: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Page 46: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…

Do similar exercise for hemoglobin using zinc.dta

zinc.dta

Page 47: Session I How to use STATA & Basic Data Management Commands

Answer!!!

Page 48: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

What do you mean by 1 & 2???

zinc.dta

Page 49: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Label name

Page 50: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

What is wrong and how to correct it??? zinc.dta

Page 51: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis …contdPreparing Data for Analysis …contd

Page 52: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis …contdPreparing Data for Analysis …contd

Page 53: Session I How to use STATA & Basic Data Management Commands

Generate total stool output for first 48 hrs

Do this Exercise…

zinclean.dta

Page 54: Session I How to use STATA & Basic Data Management Commands

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Page 55: Session I How to use STATA & Basic Data Management Commands

Draw a boxplot and identify extreme value, if any, for s2_tstool_wt using zincclean.dta

Do this Exercise…

zincclean.dta

Page 56: Session I How to use STATA & Basic Data Management Commands

Session III Session III

Introduction to Introduction to Basic Data AnalysisBasic Data Analysis

Page 57: Session I How to use STATA & Basic Data Management Commands

What will be Covered?What will be Covered?

Descriptive Statistics Parametric tests Non-parametric tests

Page 58: Session I How to use STATA & Basic Data Management Commands

AnalysesAnalyses

Univariate (one variable at a time)Univariate (one variable at a time) Bivariate (two variables at a time)Bivariate (two variables at a time) Multivariate (more than two variables Multivariate (more than two variables

at a time)at a time)

Page 59: Session I How to use STATA & Basic Data Management Commands

Descriptive Statistics

Page 60: Session I How to use STATA & Basic Data Management Commands

Univariate AnalysisUnivariate Analysis

Quantitative

MeanMedianRange/IQ RangeSD

CategoricalCategorical

FrequencyFrequencypercentagepercentage

Page 61: Session I How to use STATA & Basic Data Management Commands

Descriptive Statistics-Categorical Variable

Can we label the

variables???

Page 62: Session I How to use STATA & Basic Data Management Commands

Contingency Table

Page 63: Session I How to use STATA & Basic Data Management Commands

Contingency Table …contd

Page 64: Session I How to use STATA & Basic Data Management Commands

Contingency Table …contd

Page 65: Session I How to use STATA & Basic Data Management Commands

Contingency Table …contd

Page 66: Session I How to use STATA & Basic Data Management Commands

Contingency Table …contd

Immediate commands

Page 67: Session I How to use STATA & Basic Data Management Commands

Ex 1: Draw a crosstab between treatment and withdrawn using zinc.dta

Ex 2: Draw a crosstab between treatment and diarr24, diarr48

Do this Exercise…

zinc.dta

Page 68: Session I How to use STATA & Basic Data Management Commands

Descriptive Statistics-Quantitative Variable

Page 69: Session I How to use STATA & Basic Data Management Commands

Summary in Detail

Page 70: Session I How to use STATA & Basic Data Management Commands

Calculate summary statistics for the following variables:

1. Total stool output 0-48h2. Total ORS intake 0-24h3. Total stool frequency in 24h before admission4. Serum zinc at admission

Do this Exercise…

zinc.dta

Page 71: Session I How to use STATA & Basic Data Management Commands

Summary Statistics by Group

Page 72: Session I How to use STATA & Basic Data Management Commands

Calculate summary statistics by “treament” for the following variables:

1. Total stool output 0-48h2. Total ORS intake 0-24h3. Total stool frequency in 24h before admission4. Serum zinc at admission

Do this Exercise…

zinc.dta

Page 73: Session I How to use STATA & Basic Data Management Commands

Percentile Values

Page 74: Session I How to use STATA & Basic Data Management Commands

Calculate 3rd and 97th percentile value by “treatment” for the following variables:

1. Total stool output 0-48h2. Total ORS intake 0-24h

Do this Exercise…

zinc.dta

Page 75: Session I How to use STATA & Basic Data Management Commands

Session IV (A)

Bi-variate Analyses

Page 76: Session I How to use STATA & Basic Data Management Commands

Analysis of Clinical Trial Data

Page 77: Session I How to use STATA & Basic Data Management Commands

1. Compare patient characteristics at the time of randomization and baseline measurements between the groups

2. Assess the difference in outcome variable(s) between the groups (adjusting for any imbalance in patient characteristics or baseline outcome variables)

Analysis of Clinical Trial Data

Page 78: Session I How to use STATA & Basic Data Management Commands

1. Categorical vs Categorical

2. Categorical vs Quantitative

Bi-variate AnalysesBi-variate AnalysesBi-variate AnalysesBi-variate Analyses

Page 79: Session I How to use STATA & Basic Data Management Commands

1. Categorical Vs 1. Categorical Vs CategoricalCategorical

Unrelated Related

-Chi square test McNemar test

- Fishers Exact test

X=2, Y=2 X>2, Y>2

Unrelated

- Chi square test

- Fishers Exact test

X :Group variable

Y :Outcome variable

Page 80: Session I How to use STATA & Basic Data Management Commands

Chi-square test

Page 81: Session I How to use STATA & Basic Data Management Commands

Is there a difference between the proportion of patients requiring IV fluids in the two treatment groups?

Do this Exercise…

zinc.dta

Page 82: Session I How to use STATA & Basic Data Management Commands

Chi-square Test/Fisher’s exact Test by Group

Page 83: Session I How to use STATA & Basic Data Management Commands

Comparison of two proportions

Page 84: Session I How to use STATA & Basic Data Management Commands

1. Is there a difference in the proportion of patients recovered in rota virus negativity between the two treatment groups?

2. 91% of patients recovered in treatment A (n=248) and 95% of patients recovered in treatment B (n=252). Test these proportions and find out the p-value

Do this Exercise…

zinc.dta

Page 85: Session I How to use STATA & Basic Data Management Commands

McNemar’s Chi-square Test

Page 86: Session I How to use STATA & Basic Data Management Commands

McNemar’s Chi-square Test …contd

<

<

Page 87: Session I How to use STATA & Basic Data Management Commands

Is there a shift in zinc deficiency from baseline after giving treatment B?

Do this Exercise…

zinc.dta

Page 88: Session I How to use STATA & Basic Data Management Commands

2. Categorical vs 2. Categorical vs QuantitativeQuantitative

X=2 & Y: Normal

Unrelated Related

Student’s t test Paired ‘t’ test

X=2 & Y: Non Normal

Unrelated Related

Wilcoxon ranksum Wilcoxon signrank

X>2 & Y: Non-NormalX> 2 & Y: Normal

Unrelated Related

One way Repeated

ANOVA measures ANOVA

Unrelated Related

Kruskal Wallis Freidmans test

Parametric Non-Parametric

Page 89: Session I How to use STATA & Basic Data Management Commands

Student’s ‘t’ Test for Independent Groups

Page 90: Session I How to use STATA & Basic Data Management Commands

Student’s ‘t’ Test for Independent Groups …

contd

Page 91: Session I How to use STATA & Basic Data Management Commands

What is the Difference in the Total ORS Intake in the First 24h between the Two

Groups?

Page 92: Session I How to use STATA & Basic Data Management Commands

Transformations

Page 93: Session I How to use STATA & Basic Data Management Commands

Transformations …contd

Page 94: Session I How to use STATA & Basic Data Management Commands

Ex 1: What is the difference in total stool output 0-48hours between the two groups?

Ex 2: Is there a difference between total duration of diarrhea (in hours) (varname: tot_du_dia_h) between the two treatment groups?

Do this Exercise…

zinc.dta

Page 95: Session I How to use STATA & Basic Data Management Commands

Geometric Mean if Log Transformation is Used

Page 96: Session I How to use STATA & Basic Data Management Commands

Do this Exercise

Ex: Calculate the geometric mean for stool output

0-48 hours

zinc.dta

Page 97: Session I How to use STATA & Basic Data Management Commands

Paired t-Test

Page 98: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…

Is there a change in zinc value from baseline after giving treatment B?

zinc.dta

Page 99: Session I How to use STATA & Basic Data Management Commands

Is there a Change in the Serum Zinc from Is there a Change in the Serum Zinc from Baseline to Recovery between Two Treatment Baseline to Recovery between Two Treatment

Groups?Groups?

Discuss………..

Page 100: Session I How to use STATA & Basic Data Management Commands

One-way ANOVA*

* Analysis of Variance

Page 101: Session I How to use STATA & Basic Data Management Commands

Multiple ComparisonsDifference in means of zinc values between age group

of ≤6 & > 12

P-value

Page 102: Session I How to use STATA & Basic Data Management Commands

Non-Parametric Methods

Page 103: Session I How to use STATA & Basic Data Management Commands

Is there a difference in total stool output in the first 24h between the two treatment groups?

Answer: Wilcoxon Ranksum test

Page 104: Session I How to use STATA & Basic Data Management Commands

Is there a difference in total stool output in the first 24h between the two treatment groups? …

contd

Is there a difference in total stool output in the first 24h between the two treatment groups? …

contdAnswer: Wilcoxon Ranksum test

Page 105: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…

Is there a difference in total diarrhea duration between the two groups?

zinc.dta

Page 106: Session I How to use STATA & Basic Data Management Commands

Is there a Change in zinc from baseline after giving treatment A?

Answer: Wilcoxon signed-rank testAnswer: Wilcoxon signed-rank test

Page 107: Session I How to use STATA & Basic Data Management Commands

Answer: Wilcoxon signed-rank testAnswer: Wilcoxon signed-rank test

Is There a Change in zinc from baseline after giving treatment A?

Is There a Change in zinc from baseline after giving treatment A?

Page 108: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…Do this Exercise…

1. Is there any difference in zinc from baseline after giving treatment B?

zinc.dta

Page 109: Session I How to use STATA & Basic Data Management Commands

Is there a difference in total stool output across age groups?

…… Contd

Answer: Kruskal-Wallis Test

Page 110: Session I How to use STATA & Basic Data Management Commands

…… Contd

Is there a difference in total stool output across age groups?

Answer: Kruskal-Wallis Test

Page 111: Session I How to use STATA & Basic Data Management Commands

… Contd

Is there a difference in total stool output across age groups?

Answer: Kruskal-Wallis Test

Page 112: Session I How to use STATA & Basic Data Management Commands

…… Contd

Is there a difference in total stool output across age groups?

Answer: Kruskal-Wallis Test

Page 113: Session I How to use STATA & Basic Data Management Commands

Do this Exercise…Do this Exercise…

1. Is there any difference in serum zinc (at admission) across the age groups ?

zinc.dta