accounting serx

25
PhD Talk: Regression analysis using Stata: A hands on approach By: Dr. Redhwan Al-Dhamari Bakr Ali Al-Gamrh 29/5/2014 K. M. Kura, PhD Thesis Oral

Upload: zeer1234

Post on 20-Jan-2017

230 views

Category:

Economy & Finance


0 download

TRANSCRIPT

PhD Talk: Regression analysis using Stata: A hands on approach

By:

Dr. Redhwan Al-Dhamari Bakr Ali Al-Gamrh

29/5/2014 1K. M. Kura, PhD Thesis Oral Presentation

Presentation Outline introduction Data Structure Cross-Sectional Data Regression Diagnostics Other Regression Commands Presenting Your Results Suggested readings

29/5/2014 2K. M. Kura, PhD Thesis Oral Presentation

IntroductionWhat is Stata?Stata is a general-purpose statistical software package created in 1985

by Stata Corp. Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine and epidemiology.

Why Stata? Stata has been less popular than its market competitors, such as

SPSS and SAS, but it gaining in popularity every year.

It is particularly user-friendly when it comes to analyzing complicated data sets.

29/5/2014 3K. M. Kura, PhD Thesis Oral Presentation

Introduction (Cont’d)How is Stata different? The commands in Stata are much more intuitive and less fussy

regarding punctuation.

In Stata, it is possible to download new applications that were written by users to perform specific tasks, and use them as commands.

Dealing with longitudinal data sets with various different types of file structures in Stata is much quicker and easier.

29/5/2014 4K. M. Kura, PhD Thesis Oral Presentation

Introduction (Cont’d)

29/5/2014 5K. M. Kura, PhD Thesis Oral Presentation

Windows in Stata and what they do the command windowThe review windowThe variables windowThe results windowDo filesLog files data editor and data browser•Set mem 50mlog using name.log,replaceLog closeLog offLog on

Data structure Cross-sectional data

Panel data

Time series data

29/5/2014 6K. M. Kura, PhD Thesis Oral Presentation

Cross-sectional dataSummary statistics Sum var Sum var, sep(0) Sum var, detail Tabstat var, s(n me sd min max ske kur) c(s) Tabstat var, s(n me sd min min max) by (var)

29/5/2014 7K. M. Kura, PhD Thesis Oral Presentation

Cross-sectional data

29/5/2014 8K. M. Kura, PhD Thesis Oral Presentation

Correlations Pearson’s product-moment correlation (r) It focuses on mean values it is used for interval variables Values below 0.30 suggest there is little association between the variables (Hinkle et al. 1988). pwcorr var, obs sig star (0.05)

Spearman’s correlation (rho) it calculated based on ranks it used for ordinal variables spearman var, stats (rho obs p) star (0.05)

(Cont’d)

Cross-sectional data

29/5/2014 9K. M. Kura, PhD Thesis Oral Presentation

Differences in Means and medians Independent two-sample t-test it helps to know if there are mean differences in data that might be interesting to pursue with multivariate analysis there can not be more than two groups on witch you are comparing the mean value-the grouping variable must be dichotomous. ttest var, by (grouping var) sdtest var, by (grouping var)

Mann-Whitney U-test it is used to examine the rank differences across some characteristic for two groups. ranksum var, by (grouping var)

Paired t-test and Wilcoxon signed rank test ttest ind07==ind08 signrank ind07==ind08

(Cont’d)

Cross-sectional data

Theory of regression analysisWhat is linear regression analysis?• Finding the relationship between a dependent and

an independent variable.

Y= α + bx + e

(Cont’d)

Regression diagnostics Normality

Heteroscedasticity

Multicollinearity

Model specification

29/5/2014 11K. M. Kura, PhD Thesis Oral Presentation

Regression diagnostics (Cont’d).

29/5/2014 12K. M. Kura, PhD Thesis Oral Presentation

Normality refers to normal distribution of the error terms

Testing the residuals for normality

Shapiro-Wilk W test Swilk res

Smirnov-Kolmogorov testSktest res

Testing the normality for a variableSktest varTabstat var, s(sk kur)

Regression diagnostics (Cont’d).Outliers detectionOutlier detection involves the determination whether the residuals

(errors=predicted-actual) is an extreme negative or positive value.Standardized residuals predict residstd, rstandard List residstd

if the standardized residuals have values in excess of 3.5 and -3.5 they are outliers.

Cook’s D Predict cook, cooksd List cook. If cook > 4/n

Winsorization Winsor2 (var), replace cuts (1 0.99)

29/5/2014 13K. M. Kura, PhD Thesis Oral Presentation

Regression diagnostics (Cont’d).

29/5/2014 14K. M. Kura, PhD Thesis Oral Presentation

HeteroskedasticityRefers to a situation in which the error terms of the model have no constant variances. This problem should be addressed as sometimes can make significant variables appear to be statistically insignificant.

Testing the residuals for heteroskedasticity hettest

Solving heteroskedasticity problem reg var, robust

Regression diagnostics (Cont’d).

29/5/2014 15K. M. Kura, PhD Thesis Oral Presentation

MulticollinearityRefers to a high correlation of two or more independent variables in a regression model. This problem may affect the regression estimates.

Testing for multicollinearity vif

Solving multicollinearity problem Centering or standardizing approach

Regression diagnostics (Cont’d). Model specification refer to including all relevant and excluding all irrelevant variables.

Testing for model specification ovtest Linktest

29/5/2014 16K. M. Kura, PhD Thesis Oral Presentation

Other regression commands

Logistic Regression logistic var

Probit Regression is the other main method for analysing binary dependent variables. Whereas logit (or logistic) regression is based on log odds, probit uses the cumulative normal probability distribution.

probit var

Poisson Regression is for a count (non-negative integers) dependent variable

poission var

29/5/2014 17K. M. Kura, PhD Thesis Oral Presentation

Presenting your results

29/5/2014 18K. M. Kura, PhD Thesis Oral Presentation

For descriptive and correlation results Edit copy table Open a blank word document and press paste Table convert text to table

For regression results esttab esttab, se ar2

• The difference between cross-sectional, time series and panel data

• Why panel? • More observations mean more information• Certain structure of the data allow better use

of the data

• Data need to be set as panel in Stata (time and individual dimensions)

• Summary statistics for panel, xtsum, xtdes …

• Fixed effects• Random effects models• Pooled OLS

• Hausman test• Breusch and Pagan Lagrangian Multiplier (LM)

test

• Modified Wald test for groupwise heteroskedasticity• Wooldridge test for autocorrelation in panel data• Pesaran's test of cross sectional dependence

Suggested readings Gujarati & Porter (2010) “Essentials of econometrics”, McGraw-Hill,

New York.

Cameron & Trivedi (2009) “ Microeconometrics Using Stata”, A Stata Press Publication, Stata LP, College Station, Texas, USA.

Pevaline & Robson (2009) “the Stata Survival Manual”, Two Penn Plaza, New York, USA.

Woorldridge (2003) “ Introductory econometrics: A modern approach (2nd Ed.), Thomsom South-Western, USA.

29/5/2014 24K. M. Kura, PhD Thesis Oral Presentation

Thank you for Listening

29/5/2014 25K. M. Kura, PhD Thesis Oral Presentation