11
Mgt 540Mgt 540Research MethodsResearch Methods
Data AnalysisData Analysis
22
Additional “sources”Additional “sources”
Compilation of sources:Compilation of sources:http://lrs.ed.uiuc.edu/tse-portal/datacollectionmethodolhttp://lrs.ed.uiuc.edu/tse-portal/datacollectionmethodol
ogies/jin-tselink/tselink.htmogies/jin-tselink/tselink.htmhttp://http://web.utk.edu/~dap/Random/Order/Start.htmweb.utk.edu/~dap/Random/Order/Start.htm
Data Analysis Brief Book Data Analysis Brief Book (glossary)(glossary)http://http://rkb.home.cern.ch/rkb/titleA.htmlrkb.home.cern.ch/rkb/titleA.html
Exploratory Data AnalysisExploratory Data Analysishttp://http://
www.itl.nist.gov/div898/handbook/eda/eda.htmwww.itl.nist.gov/div898/handbook/eda/eda.htmStatistical Data AnalysisStatistical Data Analysis
http://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/oprehttp://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/opre330.htm330.htm
33
FIGURE 12.1Copyright © 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E
44
Data AnalysisData Analysis Get the “feel” for the dataGet the “feel” for the data
Get Mean, variance' and standard Get Mean, variance' and standard deviation on each variabledeviation on each variable
See if for all items, responses range all See if for all items, responses range all over the scale, and not restricted to one over the scale, and not restricted to one end of the scale alone. end of the scale alone.
Obtain Pearson Correlation among the Obtain Pearson Correlation among the variables under study. variables under study.
Get Frequency Distribution for all the Get Frequency Distribution for all the variables. variables.
Tabulate your data. Tabulate your data. Describe your sample's key Describe your sample's key characteristics (Demographic details of characteristics (Demographic details of sex composition, education, age, length sex composition, education, age, length of service, etc. ) of service, etc. )
See Histograms, Frequency Polygons, See Histograms, Frequency Polygons, etc. etc.
55
Quantitative DataQuantitative Data
Each type of data requires Each type of data requires different analysis method(s):different analysis method(s):NominalNominal
LabelingLabelingNo inherent “value” basisNo inherent “value” basisCategorization purposes onlyCategorization purposes only
OrdinalOrdinalRanking, sequenceRanking, sequence
IntervalIntervalRelationship basis (e.g. age)Relationship basis (e.g. age)
66
Central Tendency Central Tendency Mean, median modeMean, median mode
Spread Spread Variance, standard deviation, Variance, standard deviation, rangerange
Distribution (Shape )Distribution (Shape )Skewness, kurtosisSkewness, kurtosis
Descriptive StatisticsDescriptive StatisticsDescribing key features of dataDescribing key features of data
77
Descriptive StatisticsDescriptive StatisticsDescribing key features of Describing key features of
datadata
NominalNominalIdentification / categorization Identification / categorization onlyonly
Ordinal Ordinal (Example on pg. 139)(Example on pg. 139)Non-parametric statisticsNon-parametric statistics
Do not assume equal intervals Do not assume equal intervals Frequency countsFrequency countsAverages (median and mode)Averages (median and mode)
IntervalIntervalParametricParametric
Mean, Standard Deviation, Mean, Standard Deviation, variancevariance
88
Testing “Goodness of Testing “Goodness of Fit”Fit”
Reliability
Validity
Internal Consistency
Split Half
Discriminant
Convergent
FactorialInvolves Correlations and Factor Analysis
99
Testing HypothesesTesting Hypotheses
Use appropriate statistical Use appropriate statistical analysisanalysisT-test T-test (single or twin-tailed)(single or twin-tailed)
Test the significance of differences Test the significance of differences of the mean of two groupsof the mean of two groups
ANOVAANOVATest the significance of differences Test the significance of differences among the means of more than two among the means of more than two different groups, using the F test.different groups, using the F test.
Regression Regression (simple or multiple)(simple or multiple)Establish the variance explained in Establish the variance explained in the DV by the variance in the IVsthe DV by the variance in the IVs
1010
Statistical PowerStatistical Power
Claiming a significant differenceClaiming a significant differenceErrors in MethodologyErrors in Methodology
Type 1 errorType 1 errorReject the null hypothesis when you should Reject the null hypothesis when you should
not.not. Called an “alpha” errorCalled an “alpha” error
Type 2 errorType 2 errorFail to reject the null hypothesis when you Fail to reject the null hypothesis when you
should.should. Called a “beta” errorCalled a “beta” error
Statistical power refers to the Statistical power refers to the ability to detect true differencesability to detect true differencesavoiding type 2 errorsavoiding type 2 errors
1111
Statistical PowerStatistical Power see discussion at see discussion at http://my.execpc.com/4A/B7/helberg/pitfalls/http://my.execpc.com/4A/B7/helberg/pitfalls/
Depends on 4 issuesDepends on 4 issuesSample sizeSample sizeThe effect size you want to The effect size you want to detectdetect
The alpha (type 1 error rate) The alpha (type 1 error rate) you specifyyou specify
The variability of the sampleThe variability of the sample Too little power Too little power
Overlook effectOverlook effect Too much powerToo much power
Any difference is significantAny difference is significant
1212
Parametric vs. Parametric vs. nonparametricnonparametric Parametric Parametric (characteristics (characteristics
referring to specific population referring to specific population parameters)parameters)Parametric assumptionsParametric assumptions
Independent samples Independent samples Homogeneity of varianceHomogeneity of varianceData normally distributedData normally distributedInterval or better scaleInterval or better scale
Nonparametric assumptionsNonparametric assumptionsSometimes independence of Sometimes independence of samplessamples
1313
t-tests t-tests (Look at t tables; p. 435)(Look at t tables; p. 435)
Used to compare two means or Used to compare two means or one observed mean against a one observed mean against a guess about a hypothesized guess about a hypothesized mean mean For large samples t and z can be For large samples t and z can be considered equivalentconsidered equivalent
Calculate Calculate tt= = - - μμ SS
Where SWhere S is the standard error of is the standard error of the mean,the mean,
S/S/√n and√n and df = n-1 df = n-1
1414
t-testst-tests
Statistical programs will give Statistical programs will give you a choice between a you a choice between a matched pair and an matched pair and an independent t-test.independent t-test.Your sample and research Your sample and research design determine which you will design determine which you will use.use.
1515
z-test for Proportions z-test for Proportions (Look at t tables; p. 435)(Look at t tables; p. 435)
When data are nominalWhen data are nominalDescribe by counting Describe by counting occurrences of each valueoccurrences of each value
From counts, calculate From counts, calculate proportionsproportions
Compare proportion of Compare proportion of occurrence in sample to occurrence in sample to proportion of occurrence in proportion of occurrence in populationpopulationHypotheses testing allows only one Hypotheses testing allows only one of two outcomes: success or failureof two outcomes: success or failure
1616
z-test for Proportions z-test for Proportions (Look at t tables; p. 435)(Look at t tables; p. 435)
HH00: : = k, where k is a value = k, where k is a value between 0 and between 0 and 11
HH11: : k k
z = p - z = p - = p - = p - p p √(√((1- (1- )/n))/n)
Equivalent to Equivalent to χχ22 for df = 1 for df = 1
Comparing sample proportion to the population proportion
1717
Chi-Square TestChi-Square Test(sampling (sampling distribution) distribution)
One Sample One Sample Measures sample varianceMeasures sample variance
Squared deviations from the mean – Squared deviations from the mean – based on normal distributionbased on normal distribution
NonparametricNonparametric Compare expected with observed Compare expected with observed
proportionproportion HH00: Observed proportion = : Observed proportion =
expected proportionexpected proportion df = number of data pointsdf = number of data points
categories, cells (k) minus 1categories, cells (k) minus 1
χχ22 = = (O – E)(O – E)22
EE
1818
Univariate z TestUnivariate z Test
Test a guess about a Test a guess about a proportion against an proportion against an observed sample; observed sample; eg., MBAs constitute 35% of the eg., MBAs constitute 35% of the managerial populationmanagerial population
HH00: : π = .35π = .35 HH11: π : π .35 .35 (two-tailed test (two-tailed test
suggested)suggested)
1919
Univariate TestsUnivariate Tests
Some univariate tests are Some univariate tests are different in that they are different in that they are among statistical procedures among statistical procedures where you, the researcher, where you, the researcher, set the null hypothesis.set the null hypothesis.
In many other statistical tests In many other statistical tests the null hypothesis is implied the null hypothesis is implied by the test itself. by the test itself.
2020
Contingency TablesContingency TablesRelationship between nominal Relationship between nominal variablesvariables
http://www.psychstat.smsu.edu/introbook/sbk28m.htmhttp://www.psychstat.smsu.edu/introbook/sbk28m.htm Relationship between subjects' scores on Relationship between subjects' scores on
two qualitative or categorical variablestwo qualitative or categorical variables (Early childhood intervention)(Early childhood intervention)
If the columns are not contingent on the If the columns are not contingent on the rows, then the rows and column rows, then the rows and column frequencies are independent. The test of frequencies are independent. The test of whether the columns are contingent on whether the columns are contingent on the rows is called the chi square test of the rows is called the chi square test of independence. The null hypothesis is that independence. The null hypothesis is that there is no relationship between row and there is no relationship between row and column frequencies.column frequencies.
2121
CorrelationsCorrelations
A statistical summary of the A statistical summary of the degree and direction of degree and direction of association between two association between two variablesvariables
Correlation itself does not Correlation itself does not distinguish between distinguish between independent and dependent independent and dependent variablesvariables
Most common – Pearson’s Most common – Pearson’s rr
2222
CorrelationsCorrelations
You believe that a You believe that a linearlinear relationship exists between relationship exists between two variablestwo variables
The range is from –1 to +1The range is from –1 to +1 RR22, the coefficient of , the coefficient of
determination, is the % of determination, is the % of variance explained in each variance explained in each variable by the othervariable by the other
2323
CorrelationsCorrelations
rr = S = Sxyxy/S/SxxSSyy or the covariance or the covariance between x and y divided by their between x and y divided by their standard deviationsstandard deviations
Calculations neededCalculations neededThe means, x-bar and y-barThe means, x-bar and y-barDeviations from the means, (x – x-Deviations from the means, (x – x-bar) and (y – y-bar) for each casebar) and (y – y-bar) for each case
The squares of the deviations from The squares of the deviations from the means for each case to insure the means for each case to insure positive distance measures when positive distance measures when added, (x - x-bar)added, (x - x-bar)22 and (y – y-bar) and (y – y-bar)22
The cross product for each case (x – The cross product for each case (x – x-bar) times (y – y-bar)x-bar) times (y – y-bar)
2424
CorrelationsCorrelations
The null hypothesis for The null hypothesis for correlations iscorrelations isHH00: : ρ = 0 ρ = 0 and the alternative is usuallyand the alternative is usuallyHH11: ρ ≠ 0: ρ ≠ 0
However, if you can justify it However, if you can justify it prior to analyzing the data you prior to analyzing the data you might also usemight also useHH11: ρ > 0 or H: ρ > 0 or H11: ρ < 0 , : ρ < 0 , a one-tailed testa one-tailed test
2525
CorrelationsCorrelations
Alternative measuresAlternative measuresSpearman rank correlation, Spearman rank correlation, rrranksranks
rrranksranks and and r r are nearly always are nearly always equivalent measures for the same equivalent measures for the same data (even when not the data (even when not the differences are trivial)differences are trivial)
Phi coefficient, Phi coefficient, rrΦΦ, when both , when both variables are dichotomous; variables are dichotomous; again, it is equivalent to again, it is equivalent to Pearson’s Pearson’s rr
2626
CorrelationsCorrelations
Alternative measuresAlternative measuresPoint-biserial,Point-biserial, r rpb pb when when correlating a dichotomous with correlating a dichotomous with a continuous variablea continuous variable
If a scatterplot shows a If a scatterplot shows a curvilinear relationship there curvilinear relationship there are two options:are two options:A data transformation, orA data transformation, orUse the Use the correlation ratio, correlation ratio, ηη2 2
(eta-squared)(eta-squared)1 - 1 - SSwithin
SStotal
2727
ANOVAANOVA
For two groups For two groups onlyonly the t-test the t-test and ANOVA yield the same and ANOVA yield the same resultsresults
You must do paired You must do paired comparisons when working comparisons when working with three or more groups to with three or more groups to know where the means lieknow where the means lie
2828
Multivariate Multivariate TechniquesTechniques Dependent variableDependent variable
Regression in its various formsRegression in its various formsDiscriminant analysisDiscriminant analysisMANOVA MANOVA
Classificatory or data Classificatory or data reduction reduction Cluster analysisCluster analysisFactor analysisFactor analysisMultidimensional scalingMultidimensional scaling
2929
Linear RegressionLinear Regression
We would like to be able to We would like to be able to predict y from xpredict y from x
Simple linear regression with Simple linear regression with raw scoresraw scoresy = dependent variabley = dependent variablex = independent variablex = independent variableb = regression coefficient = rb = regression coefficient = rxyxyc = a constant termc = a constant term
The general model isThe general model isy = bx + c (+e)y = bx + c (+e)
sx
sy
3030
Linear RegressionLinear Regression The statistic for assessing the The statistic for assessing the
overall fit of a regression model is overall fit of a regression model is the Rthe R22 , or the overall % of , or the overall % of variance explained by the model variance explained by the model
RR22 = 1 – = 1 –
= =
= 1 – (s= 1 – (s22e e // s s22
yy), where s), where s22e e is is
the variance of the error or the variance of the error or residual residual
unpredictable variance
total variance
predictable variancetotal variance
3131
Linear RegressionLinear Regression
Multiple regression: more than one Multiple regression: more than one predictorpredictory = by = b11xx11 + b + b22xx22 + c + c
Each regression coefficient b is Each regression coefficient b is assessed independently for its assessed independently for its statistical significance; Hstatistical significance; H00: b = 0 : b = 0
So, in a statistical program’s output So, in a statistical program’s output a statistically significant b a statistically significant b rejectsrejects the notion that the variable the notion that the variable associated with b contributes associated with b contributes nothing to predicting ynothing to predicting y
3232
Linear RegressionLinear Regression
Multiple regressionMultiple regressionRR22 still tells us the amount of variation still tells us the amount of variation in y explained by all of the predictors (x) in y explained by all of the predictors (x) togethertogether
The F-statistic tells us whether the The F-statistic tells us whether the model as a whole is statistically model as a whole is statistically significantsignificant
Several other types of regression models Several other types of regression models are available for data that do not meet are available for data that do not meet the assumptions needed for least-squares the assumptions needed for least-squares models (such as logistic regression for models (such as logistic regression for dichotomous dependent variables)dichotomous dependent variables)
3333
Regression by SPSS & Regression by SPSS & other Programsother Programs Methods for developing the Methods for developing the
modelmodelStepwiseStepwise: : let’s computer try to fit all let’s computer try to fit all
chosen variables, leaving out those not chosen variables, leaving out those not significant and re-examining variables in the significant and re-examining variables in the model at each stepmodel at each step
EnterEnter: : researcher specifies that all researcher specifies that all
variables will be used in the modelvariables will be used in the model Forward, backwardForward, backward: : begin with all begin with all
(backward) or none (forward) of the (backward) or none (forward) of the variables and automatically adds or removes variables and automatically adds or removes variables without reconsideration of variables without reconsideration of variables already in the modelvariables already in the model
3434
MulticollinearityMulticollinearity
Best regression model has Best regression model has uncorrelated IVsuncorrelated IVs
Model stability low with Model stability low with excessively correlated IVsexcessively correlated IVs
Collinearity diagnostics Collinearity diagnostics identify problems, suggesting identify problems, suggesting variables to be droppedvariables to be dropped
High tolerance, low variance High tolerance, low variance inflation factor are desirableinflation factor are desirable
3535
Discriminant AnalysisDiscriminant Analysis
Regression requires DV to be Regression requires DV to be interval or ratiointerval or ratio
If DV categorical (nominal) If DV categorical (nominal) can use discriminant analysis can use discriminant analysis
IVs should be interval or ratio IVs should be interval or ratio scaledscaled
Key result is number of cases Key result is number of cases classified correctlyclassified correctly
3636
MANOVAMANOVA
Compare means on two or Compare means on two or more DVs more DVs (ANOVA limited to one DV)(ANOVA limited to one DV)
Pure MANOVA via SPSS only Pure MANOVA via SPSS only from command syntaxfrom command syntax
Can use the general linear Can use the general linear model thoughmodel though
3737
Factor AnalysisFactor Analysis
A data reduction technique – a large set A data reduction technique – a large set of variables can be reduced to a smaller of variables can be reduced to a smaller set while retaining the information from set while retaining the information from the original data setthe original data set
Data must be on an interval or ratio Data must be on an interval or ratio scalescale
E.g., a variable called socioeconomic E.g., a variable called socioeconomic status might be constructed from status might be constructed from variables such as household income, variables such as household income, educational attainment of the head of educational attainment of the head of household, and average per capita household, and average per capita income of the census block in which the income of the census block in which the person resides person resides
3838
Cluster AnalysisCluster Analysis
Cluster analysis seeks to group cases Cluster analysis seeks to group cases rather than variables; it too is a data rather than variables; it too is a data reduction techniquereduction technique
Data must be on an interval or ratio Data must be on an interval or ratio scalescale
E.g., a marketing group might want to E.g., a marketing group might want to classify people into psychographic classify people into psychographic profiles regarding their tendencies to profiles regarding their tendencies to try or adopt new products – pioneers or try or adopt new products – pioneers or early adopters, early majority, late early adopters, early majority, late majority, laggardsmajority, laggards
3939
Factor vs. Cluster Factor vs. Cluster AnalysisAnalysis Factor analysis focuses on Factor analysis focuses on
creating linear composites of creating linear composites of variables variables Number of variables with which Number of variables with which we must work is then reducedwe must work is then reduced
Technique begins with a Technique begins with a correlation matrix to seed the correlation matrix to seed the processprocess
Cluster analysis focuses on Cluster analysis focuses on cases cases
4040
Potential BiasesPotential Biases Asking the inappropriate or wrong Asking the inappropriate or wrong
research questions.research questions. Insufficient literature survey and Insufficient literature survey and
hence inadequate theoretical hence inadequate theoretical model.model.
Measurement problems Measurement problems Samples not being representative.Samples not being representative. Problems with data collection: Problems with data collection:
researcher biases researcher biases respondent biasesrespondent biasesinstrument biases instrument biases
Data analysis biases:Data analysis biases:coding errors coding errors data punching & input errors data punching & input errors inappropriate statistical analysis inappropriate statistical analysis
Biases (subjectivity) in Biases (subjectivity) in interpretation of results.interpretation of results.
4141
FIGURE 11.2Copyright © 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E
Questions to ask:Questions to ask: Adopted from Robert NilesAdopted from Robert Niles
Where did the data come from? Where did the data come from? How (Who) was the data reviewed, How (Who) was the data reviewed,
verified, or substantiated?verified, or substantiated? How were the data collected?How were the data collected? How is the data presented?How is the data presented?
What is the context?What is the context?Cherry-picking?Cherry-picking?
Be skeptical when dealing with Be skeptical when dealing with comparisons comparisons
Spurious correlationsSpurious correlations