1 correlation and regression analysis lecture 11
DESCRIPTION
3.... assesses whether a systematic relationship exists between two or more variables. If we find statistical significance between the variables we say a relationship is present. Relationship PresenceTRANSCRIPT
•1
Correlation and Regression Analysis
Lecture 11
•2
Strength of Association
Direction
Nature
Presence
Concepts About RelationshipsConcepts About Relationships
•3
. . . . assesses whether a systematic relationship exists between two or more variables. If we find statistical significance between the variables we say a relationship is present.
Relationship Presence
•4
Nonlinear relationship = often referred to as curvilinear, it is best described by a curve instead of a straight line.
Linear relationship = a “straight-line association” between two or more variables.
Relationships between variables typically Relationships between variables typically are described as either linear or nonlinear.are described as either linear or nonlinear.
Nature of RelationshipsNature of Relationships
Direction of Relationship
•5
The direction of a relationship can be either positive or negative.Positive relationship = when one variable increases, e.g., loyalty to employer, then so does another related one, e.g. effort put forth for employer.Negative relationship = when one variable increases, e.g., satisfaction with job, then a related one decreases, e.g. likelihood of searching for another job.
Strength of Association
•6
When a consistent and systematic relationship is present, the researcher must determine the strength of association. The strength ranges from very strong to slight.
•7
. . . . exists when one variable consistently and systematically changes relative to another variable. The correlation coefficient is used to assess this linkage.
Covariation
Correlation Coefficients: What do they mean?
•8
+ 1.0
0.0 Zero Correlation = the value of Y does not increase or decrease with the value of X.
- 1.0
Positive Correlation = when the value of X increases, the value of Y also increases. When the value of X
decreases, the value of Y also decreases.
Negative Correlation = when the value of X increases, the value of Y decreases. When the value of X
decreases, the value of Y increases.
Exhibit 11-1 Rules of Thumb about Correlation Coefficient
Size
•9
Coefficient Strength of Range Association
+/– .91 to +/– 1.00 Very Strong
+/– .71 to +/– .90 High
+/– .41 to +/– .70 Moderate
+/– .21 to +/– .40 Small
+/– .01 to +/– .20 Slight
Pearson Correlation
•10
The Pearson correlation coefficient measures the linear association between two metric variables. It ranges from – 1.00 to + 1.00, with zero representing absolutely no association. The larger the coefficient the stronger the linkage and the smaller the coefficient the weaker the relationship.
Coefficient of Determination
•11
The coefficient of determination is the square of the correlation coefficient, or r2. It ranges from 0.00 to 1.00 and is the amount of variation in one variable explained by one or more other variables.
Exhibit 11-3 Bivariate Correlation Between Work Group Cooperation
and Intention to Search for another Job
Variables Mean StandardDeviation
N
X4 – Work Group Cooperation
3.89 1.345 63
X16 – Intention to Search
4.27 1.807 63
•12
Descriptive Statistics
Exhibit 11-3 Bivariate Correlation Between Work Group
Cooperation and Intention to Search for another Job
X4 – Work Group Cooperation
X16 – Intention to Search
X4 – Work Group Cooperation
PearsonCorrelation
1.00 -.585*
Sig. (2-tailed) . .000
N 63 63
X16 – Intention to Search
PearsonCorrelation
-.585* 1.00
Sig. (2-tailed) .000 .
N 63 63
•13
Correlations
* Coefficient is significant at the 0.01 level (2-tailed).
Exhibit 11-5 Bar Charts for Rankings for Food Quality and Atmosphere
•14
X13 -- Food Quality Ranking
X13 -- Food Quality Ranking
Very ImportantSomewhat ImportantSlightly Important
Freq
uenc
y
140
120
100
80
60
40
20
0
X14 -- Atmosphere Ranking
X14 -- Atmosphere Ranking
Very ImportantSomewhat ImportantSlightly ImportantF
requ
ency
100
80
60
40
20
0
Exhibit 11-4 Correlation of Food Quality and Atmosphere Using
Spearman’s rhoX13 – Food
Quality RankingX14 – Atmosphere
RankingSpearman’s
rhoX13 – Food
Quality RankingCorrelationCoefficient 1.000 -.801*
Sig. (2-tailed) . .000
N 200 200X14 –
AtmosphereRanking
CorrelationCoefficient -.801* 1.000
Sig. (2-tailed) .000 .
N 200 200
•15* Coefficient is significant at the 0.01 level (2-tailed).
Exhibit 11-6 Customer Rankings of Restaurant
Selection FactorsX13 –
Food Quality Ranking
X14 – Atmosphere
Ranking
X15 – Prices
Ranking
X16 – Employees
Ranking
N Valid 200 200 200 200
Missing 0 0 0 0
Median 4.00 3.00 2.00 1.00
Minimum 2 2 1 1
Maximum 4 4 3 4
•16
Statistics
Exhibit 11-7 Classification of Statistical Techniques
•17
Number of Dependent Variables
Dependence Techniques
Interdependence Techniques
One None
Nominal
Dependent Variable Level of Measurement
Interval or Ratio
• Factor Analysis
• Cluster Analysis• Perceptual Mapping
• Correlation Analysis, Bivariate Regression and Multiple Regression
• ANOVA and MANOVA
• Conjoint Analysis
Ordinal
• Spearman’sCorrelation
• Discriminant Analysis• Conjoint
• Logistic Regression
NonmetricMetric
Exhibit 11-8 Definitions of Statistical Techniques
•18
ANOVA (analysis of variance) is used to examine statistical differences between the means of two or more groups. The dependent variable is metric and the independent variable(s) is nonmetric. One-way ANOVA has a single nonmetric independent variable and two-way ANOVA can have two or more nonmetric independent variables.Bivariate regression has a single metric dependent variable and a single metric independent variable.Cluster analysis enables researchers to place objects (e.g., customers, brands, products) into groups so that objects within the groups are similar to each other. At the same time, objects in any particular group are different from objects in all other groups.Correlation examines the association between two metric variables. The strength of the association is measured by the correlation coefficient.Conjoint analysis enables researchers to determine the preferences individuals have for various products and services, and which product features are valued the most.
Exhibit 11-8 Definitions of Statistical Techniques
•19
Discriminant analysis enables the researcher to predict group membership using two or more metric dependent variables. The group membership variable is a nonmetric dependent variable.Factor analysis is used to summarize the information from a large number of variables into a much smaller number of variables or factors. This technique is used to combine variables whereas cluster analysis is used to identify groups with similar characteristics.Logistic regression is a special type of regression that can have a non-metric/categorical dependent variable.Multiple regression has a single metric dependent variable and several metric independent variables.MANOVA is similar to ANOVA, but it can examine group differences across two or more metric dependent variables at the same time.Perceptual mapping uses information from other statistical techniques to map customer perceptions of products, brands, companies, and so forth.
Exhibit 11-10 Bivariate Regression of Satisfaction
and Food Quality
X25 – Competitor Variables Mean
Samouel’sX17 – Satisfaction 4.78
X1 – Food Quality 5.24
Gino’sX17 – Satisfaction 5.96
X1 – Food Quality 5.81
•20
Descriptive Statistics
Exhibit 11-10 Bivariate Regression of Satisfaction and Food
QualityX25 –
CompetitorModel R R Square
Samouel’s 1 .513* .263
Gino’s 1 .331* .110
•21
Model Summary
*Predictors: (Constant), X1 – Excellent Food Quality
Exhibit 11-11 Other Aspects of Bivariate
RegressionX25 –
CompetitorModel Sum of
SquaresMean
SquareF Sig.
Samouel’s 1Regression 35.001 35.001 34.945
.000*
Residual 98.159 1.002
Total 133.160
Gino’s 1Regression 10.310 10.310 12.095
.001*
Residual 83.530 .852
Total 93.840
•22
*Predictors: (Constant), X1 – Excellent Food Quality Dependent Variable: X17 – Satisfaction
Exhibit 11-11 Other Aspects of Bivariate
Regression continued
X25 – Competitor
Model Unstandardized Coefficients
Standardized Coefficients
t Sig.
B Std. Error
Beta
Samouel’s1 (Constant) 2.376 .419 5.671 .000
.459 .078 .513 5.911 .000
Gino’s 1 (Constant) 4.307 .484 8.897 .000
.284 .082 .331 3.478 001
•23
Coefficients
*Dependent Variable: X17 – Satisfaction
Calculating the “Explained” and “Unexplained” Variance in
Regression
•24
The unexplained variance in regression, referred to as residual variance, is calculated by dividing the residual sum of squares by the total sum of squares. For example, in Exhibit 11-11, divide the residual sum of squares for Samouel’s of 98.159 by 133.160 and you get .737. This tells us that a lot of variance (73.7%) in the dependent variable in not explained by this regression equation.
The explained variance in regression, referred to as r2, is calculated by dividing the regression sum of squares by the total sum of squares. For example, in Exhibit 11-11, divide the regression sum of squares for Samouel’s of 35.00l by 133.160 and you get .263.
How to calculate the t-value?
•25
The t-value is calculated by dividing the regression coefficient by its standard error. In Exhibit 11-11 in the Coefficients table, if you divide the Unstandardized Coefficient for Samouel’s of .459 by the Standard Error of .078, the result will be a t-value of 5.8846. Note that the number in the table for the t-value is 5.911. The difference between the calculated 5.8846 and the 5.911 reported in the table is due to the fact that the computer reported the “rounded off” numbers for the Unstandardized Coefficient and the Standard Error but the t-value is calculated and reported without rounding.
How to interpret the regression coefficient ?
•26
The regression coefficient of .459 for Samouel’s X1– Food Quality reported in Exhibit 11-11 is interpreted as follows: “ . . . for every unit that X1 increases, X17 will increase by .459 units.” Recall that in this exampleX1 is the independent (predictor) variable and X17 is the dependent variable.
Exhibit 11-13 Multiple Regression of Return in Future and
Food Independent Variables
X25 – Competitor Variables Mean
Samouel’s X18 – Return in Future 4.37
X1 – Excellent Food Quality 5.24
X4 – Excellent Food Taste 5.16
X9 – Wide Variety of Menu Items 5.45
Gino’s X18 – Return in Future 5.55
X1 – Excellent Food Quality 5.81
X4 – Excellent Food Taste 5.73
X9 – Wide Variety of Menu Items 5.56
•27
Descriptive Statistics
Exhibit 11-13 Multiple Regression of Return in Future and
Food Independent Variables (continued)
X25 – Competitor
Model R R Square Adjusted R Square
Samouel’s 1 .512* .262 .239
Gino’s 1 .482* .232 .208
•28
Model Summary
*Predictors: (Constant), X9 – Wide Variety of Menu Items, X1 – Excellent Food Quality, X4 – Excellent Food Taste
Dependent Variable: X18 – Return in Future
Exhibit 11-14 Other Information for Multiple
Regression ModelsX25 –
CompetitorModel Sum of
SquaresMean
SquareF Sig.
Samouel’s 1 Regression 28.155 9.385 11.382.000*
Residual 79.155 .825
Total 107.310
Gino’s 1 Regression 22.019 7.340 9.688.000*
Residual 72.731 .758
Total 94.750
•29
*Predictors: (Constant), X9 – Wide Variety of Menu Items, X1 – Excellent Food Quality, X4 – Excellent Food Taste
Dependent Variable: X18 – Return in Future
ANOVA
Exhibit 11-14 Other Information for Multiple Regression Models
X25 – Competitor
Unstandardized Coefficients
Standardized Coefficients
t Sig.
Model B Std. Error Beta
Samouel’s 1(Constant) 2.206 .443 4.985 .000
X1 – Exc. Food Quality
.260 .116 .324 2.236 .028
X4 – Exc.Food Taste
.242 .137 .291 1.770 .080
X9 – Wide Variety
-8.191E-02 .123 -.094 -.668 .506
Gino’s 1(Constant) 2.877 .507 5.680 .000
X1 – Exc. Food Quality
.272 .119 .316 2.295 .024
X4 – Exc.Food Taste
.241 .132 .264 1.823 .071
X9 – Wide Variety
-5.275E-02 .125 -.065 -.421 .675
•30
Coefficients*
*Dependent Variable: X18 – Return in Future
Exhibit 11-17 Summary Statistics for Employee Regression Model
•31
Model R R Square Adjusted R Square
1 .506 .256 .218
Model Sum of Squares
Mean Square
F Sig.
1 Regression 17.041 5.680 6.762 .001
Residual 49.563 .840
Total 66.603
Model Summary
*Predictors: (Constant), X12 – Benefits Reasonable, X9 – Pay Reflects Effort, X1 – Paid Fairly Dependent Variable: X14 – Effort
Exhibit 11-18 Coefficients for Employee Regression Model
Unstandardized Coefficients
Standardized Coefficients
t Sig. CollinearityStatistics
Model B Std. Error
Beta Tolerance VIF
1 (Constant) 3.089 .680 4.541 .000
X1 – Paid Fairly
.178 .281 .157 .633 .529 .204 4.894
X4 – PayReflects Effort
.553 .157 .516 3.521 .001 .588 1.701
X9 – Benefits Reasonable
-.256 .300 -.203 -.855 .396 .224 4.456
•32
Coefficients*
*Dependent Variable: X14 – Effort
Exhibit 11-19 Bivariate Correlations of Effort and
Compensation Variables
X14 – Effort X1 – Paid Fairly X1 – PayReflects Effort
X12 – Reasonable Benefits
X14 – Effort1.000 .309 .496 .241
X1 – Paid Fairly
.309 1.000 .639 .880
X1 – PayReflects Effort
.496 .639 1.000 .592
X12 – Reasonable
Benefits
.241 .880 .592 1.000
•33
Pearson Correlations
Exhibit 11-19 Bivariate Correlations of Effort and
Compensation Variables
X14 – Effort X1 – Paid Fairly X1 – PayReflects Effort
X12 – Reasonable Benefits
X14 – Effort . .007 .000 .028
X1 – Paid Fairly
.309 . .000 .000
X1 – PayReflects Effort
.000 .000 . .000
X12 – Reasonable
Benefits
.028 .000 .000 .
•34
Statistical Significance of Pearson Correlations (1 – tailed)
Exhibit 11-20 Stepwise Regression Based on Samouel’s Customer
Survey
Model R R Square Adjusted R Square
Std. Error of the Estimate
1 .513 .263 .255 1.00
1 .597 .356 .343 .94
•35
*Predictors: (Constant), X1 – Excellent Food Quality, X6 – Friendly Employees Dependent Variable: X17 – Satisfaction
Model Summary
Exhibit 11-20 Stepwise Regression Based on Samouel’s Customer
SurveyModel Sum of
SquaresMean
SquareF Sig.
1 Regression 35.001 35.001 34.945 .000
Residual 98.159 1.002
Total 133.160 26.825 .000
1 Regression 47.421 23.711
Residual 85.739
Total 133.160
•36
*Predictors: (Constant), X1 – Excellent Food Quality, X6 – Friendly Employees Dependent Variable: X17 – Satisfaction
ANOVA
Exhibit 11-21 Means and Correlations for Selected Variables from Samouel’s Customer
Survey
Variables Mean
X17 – Satisfaction 4.78
X1 – Excellent Food Quality 5.24
X4 – Excellent Food Taste 5.16
X9 – Wide Variety of Menu Items 5.45
X6 – Friendly Employees 2.89
X11 – Courteous Employees 1.96
X12 – Competent Employees 1.62
•37
Descriptive Statistics
Exhibit 11-20 Independent Variables in
Stepwise Regression Model
Model VariablesEntered
VariablesRemoved
Method
1 X1 – Excellent Food Quality
. Stepwise (Criteria: Probability-of-R-to-enter <J= .050, Probability-of-F-
to-remove>= .100
2 X6 – Friendly Employees
.
•38
*Predictors: (Constant), X1 – Excellent Food Quality, X6 – Friendly Employees Dependent Variable: X17 – Satisfaction
ANOVA
Exhibit 11-23 Coefficients for Stepwise Regression Model
Unstandardized Coefficients
Standardized Coefficients
t Sig. CollinearityStatistics
Model B Std. Error
Beta Tolerance VIF
1 (Constant) 2.376 .419 5.67 .000
X1 – Excellent
Food Quality
.459 .078 .513 5.91 .000 1.00 1.00
2 (Constant) 1.716 .431 3.98 .000
X1 – Excellent
Food Quality
.402 .074 .449 5.39 .000 .958 1.044
X6 – Friendly
Employees
.332 .088 .312 3.75 .000 .958 1.044
•39
Coefficients*
*Dependent Variable: X17 – Satisfaction
•40
THANK YOU