1 correlation and regression analysis lecture 11

•1

Correlation and Regression Analysis

Lecture 11

•2

Strength of Association

Direction

Nature

Presence

Concepts About RelationshipsConcepts About Relationships

•3

. . . . assesses whether a systematic relationship exists between two or more variables. If we find statistical significance between the variables we say a relationship is present.

Relationship Presence

•4

Nonlinear relationship = often referred to as curvilinear, it is best described by a curve instead of a straight line.

Linear relationship = a “straight-line association” between two or more variables.

Relationships between variables typically Relationships between variables typically are described as either linear or nonlinear.are described as either linear or nonlinear.

Nature of RelationshipsNature of Relationships

Direction of Relationship

•5

The direction of a relationship can be either positive or negative.Positive relationship = when one variable increases, e.g., loyalty to employer, then so does another related one, e.g. effort put forth for employer.Negative relationship = when one variable increases, e.g., satisfaction with job, then a related one decreases, e.g. likelihood of searching for another job.

Strength of Association

•6

When a consistent and systematic relationship is present, the researcher must determine the strength of association. The strength ranges from very strong to slight.

•7

. . . . exists when one variable consistently and systematically changes relative to another variable. The correlation coefficient is used to assess this linkage.

Covariation

Correlation Coefficients: What do they mean?

•8

+ 1.0

0.0 Zero Correlation = the value of Y does not increase or decrease with the value of X.

- 1.0

Positive Correlation = when the value of X increases, the value of Y also increases. When the value of X

decreases, the value of Y also decreases.

Negative Correlation = when the value of X increases, the value of Y decreases. When the value of X

decreases, the value of Y increases.

Exhibit 11-1 Rules of Thumb about Correlation Coefficient

Size

•9

Coefficient Strength of Range Association

+/– .91 to +/– 1.00 Very Strong

+/– .71 to +/– .90 High

+/– .41 to +/– .70 Moderate

+/– .21 to +/– .40 Small

+/– .01 to +/– .20 Slight

Pearson Correlation

•10

The Pearson correlation coefficient measures the linear association between two metric variables. It ranges from – 1.00 to + 1.00, with zero representing absolutely no association. The larger the coefficient the stronger the linkage and the smaller the coefficient the weaker the relationship.

Coefficient of Determination

•11

The coefficient of determination is the square of the correlation coefficient, or r2. It ranges from 0.00 to 1.00 and is the amount of variation in one variable explained by one or more other variables.

Exhibit 11-3 Bivariate Correlation Between Work Group Cooperation

and Intention to Search for another Job

Variables Mean StandardDeviation

N

X4 – Work Group Cooperation

3.89 1.345 63

X16 – Intention to Search

4.27 1.807 63

•12

Descriptive Statistics

Exhibit 11-3 Bivariate Correlation Between Work Group

Cooperation and Intention to Search for another Job




PearsonCorrelation

1.00 -.585*

Sig. (2-tailed) . .000

N 63 63


PearsonCorrelation

-.585* 1.00

Sig. (2-tailed) .000 .

N 63 63

•13

Correlations

* Coefficient is significant at the 0.01 level (2-tailed).

Exhibit 11-5 Bar Charts for Rankings for Food Quality and Atmosphere

•14

X13 -- Food Quality Ranking

X13 -- Food Quality Ranking

Very ImportantSomewhat ImportantSlightly Important

Freq

uenc

y

140

120

100

80

60

40

20

0

X14 -- Atmosphere Ranking

X14 -- Atmosphere Ranking

Very ImportantSomewhat ImportantSlightly ImportantF

requ

ency

100

80

60

40

20

0

Exhibit 11-4 Correlation of Food Quality and Atmosphere Using

Spearman’s rhoX13 – Food

Quality RankingX14 – Atmosphere

RankingSpearman’s

rhoX13 – Food

Quality RankingCorrelationCoefficient 1.000 -.801*

Sig. (2-tailed) . .000

N 200 200X14 –

AtmosphereRanking

CorrelationCoefficient -.801* 1.000

Sig. (2-tailed) .000 .

N 200 200

•15* Coefficient is significant at the 0.01 level (2-tailed).

Exhibit 11-6 Customer Rankings of Restaurant

Selection FactorsX13 –

Food Quality Ranking

X14 – Atmosphere

Ranking

X15 – Prices

Ranking

X16 – Employees

Ranking

N Valid 200 200 200 200

Missing 0 0 0 0

Median 4.00 3.00 2.00 1.00

Minimum 2 2 1 1

Maximum 4 4 3 4

•16

Statistics

Exhibit 11-7 Classification of Statistical Techniques

•17

Number of Dependent Variables

Dependence Techniques

Interdependence Techniques

One None

Nominal

Dependent Variable Level of Measurement

Interval or Ratio

• Factor Analysis

• Cluster Analysis• Perceptual Mapping

• Correlation Analysis, Bivariate Regression and Multiple Regression

• ANOVA and MANOVA

• Conjoint Analysis

Ordinal

• Spearman’sCorrelation

• Discriminant Analysis• Conjoint

• Logistic Regression

NonmetricMetric

Exhibit 11-8 Definitions of Statistical Techniques

•18

ANOVA (analysis of variance) is used to examine statistical differences between the means of two or more groups. The dependent variable is metric and the independent variable(s) is nonmetric. One-way ANOVA has a single nonmetric independent variable and two-way ANOVA can have two or more nonmetric independent variables.Bivariate regression has a single metric dependent variable and a single metric independent variable.Cluster analysis enables researchers to place objects (e.g., customers, brands, products) into groups so that objects within the groups are similar to each other. At the same time, objects in any particular group are different from objects in all other groups.Correlation examines the association between two metric variables. The strength of the association is measured by the correlation coefficient.Conjoint analysis enables researchers to determine the preferences individuals have for various products and services, and which product features are valued the most.

Exhibit 11-8 Definitions of Statistical Techniques

•19

Discriminant analysis enables the researcher to predict group membership using two or more metric dependent variables. The group membership variable is a nonmetric dependent variable.Factor analysis is used to summarize the information from a large number of variables into a much smaller number of variables or factors. This technique is used to combine variables whereas cluster analysis is used to identify groups with similar characteristics.Logistic regression is a special type of regression that can have a non-metric/categorical dependent variable.Multiple regression has a single metric dependent variable and several metric independent variables.MANOVA is similar to ANOVA, but it can examine group differences across two or more metric dependent variables at the same time.Perceptual mapping uses information from other statistical techniques to map customer perceptions of products, brands, companies, and so forth.

Exhibit 11-10 Bivariate Regression of Satisfaction

and Food Quality

X25 – Competitor Variables Mean

Samouel’sX17 – Satisfaction 4.78

X1 – Food Quality 5.24

Gino’sX17 – Satisfaction 5.96

X1 – Food Quality 5.81

•20


Exhibit 11-10 Bivariate Regression of Satisfaction and Food

QualityX25 –

CompetitorModel R R Square

Samouel’s 1 .513* .263

Gino’s 1 .331* .110

•21

Model Summary

*Predictors: (Constant), X1 – Excellent Food Quality

Exhibit 11-11 Other Aspects of Bivariate

RegressionX25 –

CompetitorModel Sum of

SquaresMean

SquareF Sig.

Samouel’s 1Regression 35.001 35.001 34.945

.000*

Residual 98.159 1.002

Total 133.160

Gino’s 1Regression 10.310 10.310 12.095

.001*

Residual 83.530 .852

Total 93.840

•22

*Predictors: (Constant), X1 – Excellent Food Quality Dependent Variable: X17 – Satisfaction

Exhibit 11-11 Other Aspects of Bivariate

Regression continued

X25 – Competitor

Model Unstandardized Coefficients

Standardized Coefficients

t Sig.

B Std. Error

Beta

Samouel’s1 (Constant) 2.376 .419 5.671 .000

.459 .078 .513 5.911 .000

Gino’s 1 (Constant) 4.307 .484 8.897 .000

.284 .082 .331 3.478 001

•23

Coefficients

*Dependent Variable: X17 – Satisfaction

Calculating the “Explained” and “Unexplained” Variance in

Regression

•24

The unexplained variance in regression, referred to as residual variance, is calculated by dividing the residual sum of squares by the total sum of squares. For example, in Exhibit 11-11, divide the residual sum of squares for Samouel’s of 98.159 by 133.160 and you get .737. This tells us that a lot of variance (73.7%) in the dependent variable in not explained by this regression equation.

The explained variance in regression, referred to as r2, is calculated by dividing the regression sum of squares by the total sum of squares. For example, in Exhibit 11-11, divide the regression sum of squares for Samouel’s of 35.00l by 133.160 and you get .263.

How to calculate the t-value?

•25

The t-value is calculated by dividing the regression coefficient by its standard error. In Exhibit 11-11 in the Coefficients table, if you divide the Unstandardized Coefficient for Samouel’s of .459 by the Standard Error of .078, the result will be a t-value of 5.8846. Note that the number in the table for the t-value is 5.911. The difference between the calculated 5.8846 and the 5.911 reported in the table is due to the fact that the computer reported the “rounded off” numbers for the Unstandardized Coefficient and the Standard Error but the t-value is calculated and reported without rounding.

How to interpret the regression coefficient ?

•26

The regression coefficient of .459 for Samouel’s X1– Food Quality reported in Exhibit 11-11 is interpreted as follows: “ . . . for every unit that X1 increases, X17 will increase by .459 units.” Recall that in this exampleX1 is the independent (predictor) variable and X17 is the dependent variable.

Exhibit 11-13 Multiple Regression of Return in Future and

Food Independent Variables

X25 – Competitor Variables Mean

Samouel’s X18 – Return in Future 4.37

X1 – Excellent Food Quality 5.24

X4 – Excellent Food Taste 5.16

X9 – Wide Variety of Menu Items 5.45

Gino’s X18 – Return in Future 5.55




•27


Exhibit 11-13 Multiple Regression of Return in Future and

Food Independent Variables (continued)

X25 – Competitor

Model R R Square Adjusted R Square

Samouel’s 1 .512* .262 .239

Gino’s 1 .482* .232 .208

•28

Model Summary

*Predictors: (Constant), X9 – Wide Variety of Menu Items, X1 – Excellent Food Quality, X4 – Excellent Food Taste

Dependent Variable: X18 – Return in Future

Exhibit 11-14 Other Information for Multiple

Regression ModelsX25 –

CompetitorModel Sum of

SquaresMean

SquareF Sig.

Samouel’s 1 Regression 28.155 9.385 11.382.000*


Total 107.310

Gino’s 1 Regression 22.019 7.340 9.688.000*


Total 94.750

•29

*Predictors: (Constant), X9 – Wide Variety of Menu Items, X1 – Excellent Food Quality, X4 – Excellent Food Taste

Dependent Variable: X18 – Return in Future

ANOVA

Exhibit 11-14 Other Information for Multiple Regression Models

X25 – Competitor

Unstandardized Coefficients


t Sig.

Model B Std. Error Beta

Samouel’s 1(Constant) 2.206 .443 4.985 .000

X1 – Exc. Food Quality

.260 .116 .324 2.236 .028

X4 – Exc.Food Taste

.242 .137 .291 1.770 .080

X9 – Wide Variety

-8.191E-02 .123 -.094 -.668 .506

Gino’s 1(Constant) 2.877 .507 5.680 .000

X1 – Exc. Food Quality

.272 .119 .316 2.295 .024

X4 – Exc.Food Taste

.241 .132 .264 1.823 .071

X9 – Wide Variety

-5.275E-02 .125 -.065 -.421 .675

•30

Coefficients*

*Dependent Variable: X18 – Return in Future

Exhibit 11-17 Summary Statistics for Employee Regression Model

•31


1 .506 .256 .218

Model Sum of Squares

Mean Square

F Sig.

1 Regression 17.041 5.680 6.762 .001


Total 66.603

Model Summary

*Predictors: (Constant), X12 – Benefits Reasonable, X9 – Pay Reflects Effort, X1 – Paid Fairly Dependent Variable: X14 – Effort

Exhibit 11-18 Coefficients for Employee Regression Model



t Sig. CollinearityStatistics

Model B Std. Error

Beta Tolerance VIF

1 (Constant) 3.089 .680 4.541 .000

X1 – Paid Fairly

.178 .281 .157 .633 .529 .204 4.894

X4 – PayReflects Effort

.553 .157 .516 3.521 .001 .588 1.701

X9 – Benefits Reasonable

-.256 .300 -.203 -.855 .396 .224 4.456

•32

Coefficients*

*Dependent Variable: X14 – Effort

Exhibit 11-19 Bivariate Correlations of Effort and

Compensation Variables

X14 – Effort X1 – Paid Fairly X1 – PayReflects Effort

X12 – Reasonable Benefits

X14 – Effort1.000 .309 .496 .241

X1 – Paid Fairly

.309 1.000 .639 .880


.496 .639 1.000 .592

X12 – Reasonable

Benefits

.241 .880 .592 1.000

•33

Pearson Correlations

Exhibit 11-19 Bivariate Correlations of Effort and

Compensation Variables

X14 – Effort X1 – Paid Fairly X1 – PayReflects Effort

X12 – Reasonable Benefits

X14 – Effort . .007 .000 .028

X1 – Paid Fairly

.309 . .000 .000


.000 .000 . .000

X12 – Reasonable

Benefits

.028 .000 .000 .

•34

Statistical Significance of Pearson Correlations (1 – tailed)

Exhibit 11-20 Stepwise Regression Based on Samouel’s Customer

Survey


Std. Error of the Estimate

1 .513 .263 .255 1.00

1 .597 .356 .343 .94

•35

*Predictors: (Constant), X1 – Excellent Food Quality, X6 – Friendly Employees Dependent Variable: X17 – Satisfaction

Model Summary

Exhibit 11-20 Stepwise Regression Based on Samouel’s Customer

SurveyModel Sum of

SquaresMean

SquareF Sig.

1 Regression 35.001 35.001 34.945 .000

Residual 98.159 1.002

Total 133.160 26.825 .000

1 Regression 47.421 23.711

Residual 85.739

Total 133.160

•36


ANOVA

Exhibit 11-21 Means and Correlations for Selected Variables from Samouel’s Customer

Survey

Variables Mean

X17 – Satisfaction 4.78




X6 – Friendly Employees 2.89

X11 – Courteous Employees 1.96

X12 – Competent Employees 1.62

•37


Exhibit 11-20 Independent Variables in

Stepwise Regression Model

Model VariablesEntered

VariablesRemoved

Method

1 X1 – Excellent Food Quality

. Stepwise (Criteria: Probability-of-R-to-enter <J= .050, Probability-of-F-

to-remove>= .100

2 X6 – Friendly Employees

.

•38


ANOVA

Exhibit 11-23 Coefficients for Stepwise Regression Model



t Sig. CollinearityStatistics

Model B Std. Error

Beta Tolerance VIF

1 (Constant) 2.376 .419 5.67 .000

X1 – Excellent

Food Quality

.459 .078 .513 5.91 .000 1.00 1.00

2 (Constant) 1.716 .431 3.98 .000

X1 – Excellent

Food Quality

.402 .074 .449 5.39 .000 .958 1.044

X6 – Friendly

Employees

.332 .088 .312 3.75 .000 .958 1.044

•39

Coefficients*

*Dependent Variable: X17 – Satisfaction

•40

THANK YOU

1 correlation and regression analysis lecture 11

Documents