accounting serx
TRANSCRIPT
PhD Talk: Regression analysis using Stata: A hands on approach
By:
Dr. Redhwan Al-Dhamari Bakr Ali Al-Gamrh
29/5/2014 1K. M. Kura, PhD Thesis Oral Presentation
Presentation Outline introduction Data Structure Cross-Sectional Data Regression Diagnostics Other Regression Commands Presenting Your Results Suggested readings
29/5/2014 2K. M. Kura, PhD Thesis Oral Presentation
IntroductionWhat is Stata?Stata is a general-purpose statistical software package created in 1985
by Stata Corp. Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine and epidemiology.
Why Stata? Stata has been less popular than its market competitors, such as
SPSS and SAS, but it gaining in popularity every year.
It is particularly user-friendly when it comes to analyzing complicated data sets.
29/5/2014 3K. M. Kura, PhD Thesis Oral Presentation
Introduction (Cont’d)How is Stata different? The commands in Stata are much more intuitive and less fussy
regarding punctuation.
In Stata, it is possible to download new applications that were written by users to perform specific tasks, and use them as commands.
Dealing with longitudinal data sets with various different types of file structures in Stata is much quicker and easier.
29/5/2014 4K. M. Kura, PhD Thesis Oral Presentation
Introduction (Cont’d)
29/5/2014 5K. M. Kura, PhD Thesis Oral Presentation
Windows in Stata and what they do the command windowThe review windowThe variables windowThe results windowDo filesLog files data editor and data browser•Set mem 50mlog using name.log,replaceLog closeLog offLog on
Data structure Cross-sectional data
Panel data
Time series data
29/5/2014 6K. M. Kura, PhD Thesis Oral Presentation
Cross-sectional dataSummary statistics Sum var Sum var, sep(0) Sum var, detail Tabstat var, s(n me sd min max ske kur) c(s) Tabstat var, s(n me sd min min max) by (var)
29/5/2014 7K. M. Kura, PhD Thesis Oral Presentation
Cross-sectional data
29/5/2014 8K. M. Kura, PhD Thesis Oral Presentation
Correlations Pearson’s product-moment correlation (r) It focuses on mean values it is used for interval variables Values below 0.30 suggest there is little association between the variables (Hinkle et al. 1988). pwcorr var, obs sig star (0.05)
Spearman’s correlation (rho) it calculated based on ranks it used for ordinal variables spearman var, stats (rho obs p) star (0.05)
(Cont’d)
Cross-sectional data
29/5/2014 9K. M. Kura, PhD Thesis Oral Presentation
Differences in Means and medians Independent two-sample t-test it helps to know if there are mean differences in data that might be interesting to pursue with multivariate analysis there can not be more than two groups on witch you are comparing the mean value-the grouping variable must be dichotomous. ttest var, by (grouping var) sdtest var, by (grouping var)
Mann-Whitney U-test it is used to examine the rank differences across some characteristic for two groups. ranksum var, by (grouping var)
Paired t-test and Wilcoxon signed rank test ttest ind07==ind08 signrank ind07==ind08
(Cont’d)
Cross-sectional data
Theory of regression analysisWhat is linear regression analysis?• Finding the relationship between a dependent and
an independent variable.
Y= α + bx + e
(Cont’d)
Regression diagnostics Normality
Heteroscedasticity
Multicollinearity
Model specification
29/5/2014 11K. M. Kura, PhD Thesis Oral Presentation
Regression diagnostics (Cont’d).
29/5/2014 12K. M. Kura, PhD Thesis Oral Presentation
Normality refers to normal distribution of the error terms
Testing the residuals for normality
Shapiro-Wilk W test Swilk res
Smirnov-Kolmogorov testSktest res
Testing the normality for a variableSktest varTabstat var, s(sk kur)
Regression diagnostics (Cont’d).Outliers detectionOutlier detection involves the determination whether the residuals
(errors=predicted-actual) is an extreme negative or positive value.Standardized residuals predict residstd, rstandard List residstd
if the standardized residuals have values in excess of 3.5 and -3.5 they are outliers.
Cook’s D Predict cook, cooksd List cook. If cook > 4/n
Winsorization Winsor2 (var), replace cuts (1 0.99)
29/5/2014 13K. M. Kura, PhD Thesis Oral Presentation
Regression diagnostics (Cont’d).
29/5/2014 14K. M. Kura, PhD Thesis Oral Presentation
HeteroskedasticityRefers to a situation in which the error terms of the model have no constant variances. This problem should be addressed as sometimes can make significant variables appear to be statistically insignificant.
Testing the residuals for heteroskedasticity hettest
Solving heteroskedasticity problem reg var, robust
Regression diagnostics (Cont’d).
29/5/2014 15K. M. Kura, PhD Thesis Oral Presentation
MulticollinearityRefers to a high correlation of two or more independent variables in a regression model. This problem may affect the regression estimates.
Testing for multicollinearity vif
Solving multicollinearity problem Centering or standardizing approach
Regression diagnostics (Cont’d). Model specification refer to including all relevant and excluding all irrelevant variables.
Testing for model specification ovtest Linktest
29/5/2014 16K. M. Kura, PhD Thesis Oral Presentation
Other regression commands
Logistic Regression logistic var
Probit Regression is the other main method for analysing binary dependent variables. Whereas logit (or logistic) regression is based on log odds, probit uses the cumulative normal probability distribution.
probit var
Poisson Regression is for a count (non-negative integers) dependent variable
poission var
29/5/2014 17K. M. Kura, PhD Thesis Oral Presentation
Presenting your results
29/5/2014 18K. M. Kura, PhD Thesis Oral Presentation
For descriptive and correlation results Edit copy table Open a blank word document and press paste Table convert text to table
For regression results esttab esttab, se ar2
• The difference between cross-sectional, time series and panel data
• Why panel? • More observations mean more information• Certain structure of the data allow better use
of the data
• Data need to be set as panel in Stata (time and individual dimensions)
• Summary statistics for panel, xtsum, xtdes …
• Fixed effects• Random effects models• Pooled OLS
• Hausman test• Breusch and Pagan Lagrangian Multiplier (LM)
test
• Modified Wald test for groupwise heteroskedasticity• Wooldridge test for autocorrelation in panel data• Pesaran's test of cross sectional dependence
Suggested readings Gujarati & Porter (2010) “Essentials of econometrics”, McGraw-Hill,
New York.
Cameron & Trivedi (2009) “ Microeconometrics Using Stata”, A Stata Press Publication, Stata LP, College Station, Texas, USA.
Pevaline & Robson (2009) “the Stata Survival Manual”, Two Penn Plaza, New York, USA.
Woorldridge (2003) “ Introductory econometrics: A modern approach (2nd Ed.), Thomsom South-Western, USA.
29/5/2014 24K. M. Kura, PhD Thesis Oral Presentation