session i how to use stata & basic data management commands

Post on 28-Dec-2015

253 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Session ISession I

How to use STATAHow to use STATA& &

Basic Data Management Basic Data Management CommandsCommands

What will be covered?What will be covered?

Introduction to STATA SoftwareIntroduction to STATA Software General Guidelines in Data entryGeneral Guidelines in Data entry Data Management in STATAData Management in STATA

Introduction to STATAIntroduction to STATA

Open & Close the Output Open & Close the Output FileFile

To open the log filelog using “directory\path\filename.log”log using d:\trials\zinc.log

To closelog close

zinc.dta

To Open Log (Output) File

To Close the Log File

Append & Replace the Existing Log File

To append the existing log file

log using d:\trials\zinc.log, append

To replace the existing log file

log using d:\trials\zinc.log, replace

Open the Data FileOpen the Data File

To open the data file

use “directory\path\filename.dta”

use d:\trials\zinc.dta

To save

save zinc.dta

zinc.dta

To Make A New Directory

To Change the Directory

General Guidelines in Data Entry

Rows in the datasheet should contain individual Rows in the datasheet should contain individual information - Record.information - Record.

Each column should contain values of a single entity of Each column should contain values of a single entity of all the individuals – Variable.all the individuals – Variable.

Variable name should not exceed more than eight Variable name should not exceed more than eight characters.characters.

Variables can be either numeric or string or Variables can be either numeric or string or alphanumeric. alphanumeric.

A numeric variable must posses only numbers.A numeric variable must posses only numbers. In any datasheet, identification number is must.In any datasheet, identification number is must.

DATA DESCRIPTIONDATA DESCRIPTION

Data Management using Data Management using STATASTATA

Inputting DataInputting Data Editing DataEditing Data Creating and Changing VariablesCreating and Changing Variables Saving and Reusing DataSaving and Reusing Data Data ReorganizationData Reorganization Merging and Appending datasetsMerging and Appending datasets

Data Management using Data Management using STATASTATA

Inputting DataInputting Data

Enter data from keyboard– input varlist– input str25 name age str1 sex– Best way is copy from excel and directly paste the

data to STATA editor– Transfer from other programs

Arithmetic Operators

+ (Addition) - (Subtraction)* (Multiplication)/ (Division)^ (Raise to power)

Relational Operators

> (greater than)< (less than)> = (greater than or equal)< = (less than or equal)= = (equal)!= (not equal)

Logical Operators

& (and)| (or)!= (not equal)

Expressions

If – used when expression is to be specified with the condition

In – used when range is to be specified in the condition

Editing DataEditing Data Edit using Data Editor

– edit [varlist] [if] [in]– edit treatment centre age– edit treatment age if centre==3&age>25

Browsing Data Browsing Data List using Data Editor

– browse [varlist] [if] [in]– browse treatment centre age– browse treatment age if centre==3&age>25

Do this Exercise…Do this Exercise…

Edit the following:Edit the following:– pcode, treatment and cough only for pcode, treatment and cough only for

centre 4centre 4– browse for the same and feel the browse for the same and feel the

differencedifference

zinc.dta

Creating & Changing Creating & Changing VariablesVariables

Create new variableCreate new variable– gengenerate newvar = exp [if] [in]erate newvar = exp [if] [in]– gen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wtgen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wt

Do this Exercise……

Generate total stool output from 0-48 hoursGenerate total stool output from 0-48 hours

zinc.dta

Creating & Changing Creating & Changing VariablesVariables

…contd…contd Change contents of existing variable– To replace

replace oldvar =exp [if] [in] replace sodium1 = . if sodium1==0

– To recode recode varlist (erule) [(erule) ...] [if] [in] recode age min/6=1 7/11=2 12/max=3 , gen(agecat)

RuleRule ExampleExample MeaningMeaning

# = ## = #

# # = ## # = #

#/# = ##/# = #

nonmissing = #nonmissing = #

missing = #missing = #

3 = 13 = 1

2 4 = 52 4 = 5

4/8 = 34/8 = 3

nonmissing = 2nonmissing = 2

missing = 9missing = 9

3 recoded to 13 recoded to 1

2 and 4 recoded to 52 and 4 recoded to 5

4 through 8 recoded to 34 through 8 recoded to 3

all other nonmissing to 2all other nonmissing to 2

all other missing to 9all other missing to 9

Do this Exercise……

Ex 1: Replace all zeros in serum Potassium as missing.Replace all zeros in serum Potassium as missing.

Ex 2: Recode pre admission diarrhea duration into 0-24h, Ex 2: Recode pre admission diarrhea duration into 0-24h, 25-72h and > 72h25-72h and > 72h

zinc.dta

Rename the existing variable– rename oldvarname newvarname– ren tlc_t2 tlc2– ren tlc_t3 tlc3

Eliminate the existing variable– To drop

drop varlist drop name address

– To keep keep varlist keep idno age sodium albumin-tlc

Creating & Changing Creating & Changing VariablesVariables

…contd…contd

Creating & Changing Creating & Changing VariablesVariables

…contd…contd

zinc.dta

Saving & Reusing Data in Stata Format

To Save data– save filename.dta– save zinc, replace– clear

To reuse data– use filename– use zinc

zinc.dta

Data ReorganizationData Reorganization Sorting observations and changing variable

order– To sort

sort varlist [in] {ascending} sort pcode

– Move specified variables to front of dataset order varlist

– Move one variable to specified position move varname1 varname2

– Alphabetize specified variables and move to front of dataset aorder [varlist] zinc.dta

Data Reorganization Data Reorganization ……contdcontd

Convert data from wide to long– reshape long stubnames, i(varlist) j(varname)– reshape long albumin, i(pcode) j(time)

Wide Shape Data Long Shape Data

Data Reorganization Data Reorganization ……contdcontd

Convert data from long to wide– reshape wide stubnames, i(varlist) j(varname)– reshape wide albumin, i(pcode) j(time)

Long Shape Data Wide Shape Data

Do this Exercise…

Convert serum zinc from wide to long shape data using zinclab.dta

zinclab.dta

Answer!!!

zinclab.dta

Merging & Appending Datasets

To append datasets– append using filename

use zinc1.dta append using zinc2.dta

To merge datasets– merge [varlist] using filename

use zinclab sort pcode save zinclab, replace use zincprognostic sort pcode merge pcode using zinclab zinclab.dta

Merge file 1 (zinclab.dta) with file 2 (zincprognosis.dta)

Do this Exercise…

zinclab.dta

Session IISession II

Data Cleaning Data Cleaning & &

Preparing Data for Preparing Data for AnalysisAnalysis

Preparing Data for Preparing Data for AnalysisAnalysis

Inclusion criteria ≤ 35 months old children

Preparing Data for Analysis Preparing Data for Analysis ……contdcontd

Do this Exercise…Do this Exercise…Inclusion criteria for the study was pre admission diarrhea duration < 7 days

Ex 1: Convert pre admission diarrhea duration from hours to days using zincclean.dta

Ex 2: Find values beyond expected range

zinc.dta

Answer!!!

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Do this Exercise…

Do similar exercise for hemoglobin using zinc.dta

zinc.dta

Answer!!!

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

What do you mean by 1 & 2???

zinc.dta

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Label name

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

What is wrong and how to correct it??? zinc.dta

Preparing Data for Analysis …contdPreparing Data for Analysis …contd

Preparing Data for Analysis …contdPreparing Data for Analysis …contd

Generate total stool output for first 48 hrs

Do this Exercise…

zinclean.dta

Preparing Data for Analysis Preparing Data for Analysis …contd…contd

Draw a boxplot and identify extreme value, if any, for s2_tstool_wt using zincclean.dta

Do this Exercise…

zincclean.dta

Session III Session III

Introduction to Introduction to Basic Data AnalysisBasic Data Analysis

What will be Covered?What will be Covered?

Descriptive Statistics Parametric tests Non-parametric tests

AnalysesAnalyses

Univariate (one variable at a time)Univariate (one variable at a time) Bivariate (two variables at a time)Bivariate (two variables at a time) Multivariate (more than two variables Multivariate (more than two variables

at a time)at a time)

Descriptive Statistics

Univariate AnalysisUnivariate Analysis

Quantitative

MeanMedianRange/IQ RangeSD

CategoricalCategorical

FrequencyFrequencypercentagepercentage

Descriptive Statistics-Categorical Variable

Can we label the

variables???

Contingency Table

Contingency Table …contd

Contingency Table …contd

Contingency Table …contd

Contingency Table …contd

Immediate commands

Ex 1: Draw a crosstab between treatment and withdrawn using zinc.dta

Ex 2: Draw a crosstab between treatment and diarr24, diarr48

Do this Exercise…

zinc.dta

Descriptive Statistics-Quantitative Variable

Summary in Detail

Calculate summary statistics for the following variables:

1. Total stool output 0-48h2. Total ORS intake 0-24h3. Total stool frequency in 24h before admission4. Serum zinc at admission

Do this Exercise…

zinc.dta

Summary Statistics by Group

Calculate summary statistics by “treament” for the following variables:

1. Total stool output 0-48h2. Total ORS intake 0-24h3. Total stool frequency in 24h before admission4. Serum zinc at admission

Do this Exercise…

zinc.dta

Percentile Values

Calculate 3rd and 97th percentile value by “treatment” for the following variables:

1. Total stool output 0-48h2. Total ORS intake 0-24h

Do this Exercise…

zinc.dta

Session IV (A)

Bi-variate Analyses

Analysis of Clinical Trial Data

1. Compare patient characteristics at the time of randomization and baseline measurements between the groups

2. Assess the difference in outcome variable(s) between the groups (adjusting for any imbalance in patient characteristics or baseline outcome variables)

Analysis of Clinical Trial Data

1. Categorical vs Categorical

2. Categorical vs Quantitative

Bi-variate AnalysesBi-variate AnalysesBi-variate AnalysesBi-variate Analyses

1. Categorical Vs 1. Categorical Vs CategoricalCategorical

Unrelated Related

-Chi square test McNemar test

- Fishers Exact test

X=2, Y=2 X>2, Y>2

Unrelated

- Chi square test

- Fishers Exact test

X :Group variable

Y :Outcome variable

Chi-square test

Is there a difference between the proportion of patients requiring IV fluids in the two treatment groups?

Do this Exercise…

zinc.dta

Chi-square Test/Fisher’s exact Test by Group

Comparison of two proportions

1. Is there a difference in the proportion of patients recovered in rota virus negativity between the two treatment groups?

2. 91% of patients recovered in treatment A (n=248) and 95% of patients recovered in treatment B (n=252). Test these proportions and find out the p-value

Do this Exercise…

zinc.dta

McNemar’s Chi-square Test

McNemar’s Chi-square Test …contd

<

<

Is there a shift in zinc deficiency from baseline after giving treatment B?

Do this Exercise…

zinc.dta

2. Categorical vs 2. Categorical vs QuantitativeQuantitative

X=2 & Y: Normal

Unrelated Related

Student’s t test Paired ‘t’ test

X=2 & Y: Non Normal

Unrelated Related

Wilcoxon ranksum Wilcoxon signrank

X>2 & Y: Non-NormalX> 2 & Y: Normal

Unrelated Related

One way Repeated

ANOVA measures ANOVA

Unrelated Related

Kruskal Wallis Freidmans test

Parametric Non-Parametric

Student’s ‘t’ Test for Independent Groups

Student’s ‘t’ Test for Independent Groups …

contd

What is the Difference in the Total ORS Intake in the First 24h between the Two

Groups?

Transformations

Transformations …contd

Ex 1: What is the difference in total stool output 0-48hours between the two groups?

Ex 2: Is there a difference between total duration of diarrhea (in hours) (varname: tot_du_dia_h) between the two treatment groups?

Do this Exercise…

zinc.dta

Geometric Mean if Log Transformation is Used

Do this Exercise

Ex: Calculate the geometric mean for stool output

0-48 hours

zinc.dta

Paired t-Test

Do this Exercise…

Is there a change in zinc value from baseline after giving treatment B?

zinc.dta

Is there a Change in the Serum Zinc from Is there a Change in the Serum Zinc from Baseline to Recovery between Two Treatment Baseline to Recovery between Two Treatment

Groups?Groups?

Discuss………..

One-way ANOVA*

* Analysis of Variance

Multiple ComparisonsDifference in means of zinc values between age group

of ≤6 & > 12

P-value

Non-Parametric Methods

Is there a difference in total stool output in the first 24h between the two treatment groups?

Answer: Wilcoxon Ranksum test

Is there a difference in total stool output in the first 24h between the two treatment groups? …

contd

Is there a difference in total stool output in the first 24h between the two treatment groups? …

contdAnswer: Wilcoxon Ranksum test

Do this Exercise…

Is there a difference in total diarrhea duration between the two groups?

zinc.dta

Is there a Change in zinc from baseline after giving treatment A?

Answer: Wilcoxon signed-rank testAnswer: Wilcoxon signed-rank test

Answer: Wilcoxon signed-rank testAnswer: Wilcoxon signed-rank test

Is There a Change in zinc from baseline after giving treatment A?

Is There a Change in zinc from baseline after giving treatment A?

Do this Exercise…Do this Exercise…

1. Is there any difference in zinc from baseline after giving treatment B?

zinc.dta

Is there a difference in total stool output across age groups?

…… Contd

Answer: Kruskal-Wallis Test

…… Contd

Is there a difference in total stool output across age groups?

Answer: Kruskal-Wallis Test

… Contd

Is there a difference in total stool output across age groups?

Answer: Kruskal-Wallis Test

…… Contd

Is there a difference in total stool output across age groups?

Answer: Kruskal-Wallis Test

Do this Exercise…Do this Exercise…

1. Is there any difference in serum zinc (at admission) across the age groups ?

zinc.dta

top related