pearson correlation simple linear regression chi-square

58
By Dr. Mojgan Afshari Statistical Techniques to Explore Relationships among Variables

Upload: others

Post on 22-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pearson Correlation Simple Linear Regression Chi-Square

By

Dr. Mojgan Afshari

Statistical Techniques to Explore Relationships among Variables

Page 2: Pearson Correlation Simple Linear Regression Chi-Square

Agenda

� Pearson Product-Moment Correlation

� Simple Liner Regression

� Chi-Square Test for Independence

n2

Page 3: Pearson Correlation Simple Linear Regression Chi-Square

Pearson Product-Moment Correlation

� Purpose – determine relationshipbetween two metric variables

� Requirement:DV -Interval/RatioIV -Interval/Ratio

n3

Page 4: Pearson Correlation Simple Linear Regression Chi-Square

Assumptions1. Level of measurement( IV and DV should be interval/ ratio)2. Independence of observations3. Normality4. Linearity� The relationship between the two variables should be linear.

This means that when you look at a scatterplot of scoresyou should see a straight line (roughly), not a curve.

5. Homoscedasticity� The variability in scores for variable X should be similar at

all values of variable Y. Check the scatterplot. It shouldshow a fairly even cigar shape along its length.

Page 5: Pearson Correlation Simple Linear Regression Chi-Square

Components of Pearson r analysis

Choose it

Page 6: Pearson Correlation Simple Linear Regression Chi-Square

Note

� Before performing a correlation analysis, it is agood idea to generate a scatterplot. Thisenables you to check for violation of theassumptions of linearity and homoscedasticity.

� Inspection of the scatterplots also gives you abetter idea of the nature of the relationshipbetween your variables.

n6

Page 7: Pearson Correlation Simple Linear Regression Chi-Square

Scatter plot/ Scatter gram

Strong positive

correlation

Strong negative

correlation

Weak positive

correlation

Weak negative

correlationX

Y

X

XX

Y

Y

Y

Page 8: Pearson Correlation Simple Linear Regression Chi-Square

n8

Page 9: Pearson Correlation Simple Linear Regression Chi-Square

� Pearson correlation coefficients (r) can onlytake on values from -1 to + l.

� The sign out the front indicates whether there isa positive correlation (an increase in X will alsoincrease in the value of Y) or a negativecorrelation (as one variable increases, the otherdecreases).

� The size of the absolute value provides anindication of the strength of the relationship.

Page 10: Pearson Correlation Simple Linear Regression Chi-Square

� A perfect correlation of 1 or -1 indicates thatthe value of one variable can be determinedexactly by knowing the value on the othervariable. A scatterplot of this relationshipwould show a straight line.

� A correlation of 0 indicates no relationshipbetween the two variables.

n10

Page 11: Pearson Correlation Simple Linear Regression Chi-Square

1. Descriptive

n11

Page 12: Pearson Correlation Simple Linear Regression Chi-Square

n12

Page 13: Pearson Correlation Simple Linear Regression Chi-Square

2. Inferential

n13

Hypothesis Test

Page 14: Pearson Correlation Simple Linear Regression Chi-Square

Excel

Page 15: Pearson Correlation Simple Linear Regression Chi-Square

n15

2. Set confidence intervalGenerally, confident level is set at .05

Page 16: Pearson Correlation Simple Linear Regression Chi-Square

n16

Page 17: Pearson Correlation Simple Linear Regression Chi-Square

n17

Page 18: Pearson Correlation Simple Linear Regression Chi-Square

n18

It can be concluded that there is not significant relationship between IV and DV at 0.05 level of

significance

It can be concluded that there is significant relationship between IV and DV at 0.05 level of

significance

Page 19: Pearson Correlation Simple Linear Regression Chi-Square
Page 20: Pearson Correlation Simple Linear Regression Chi-Square

1. Data were collected from a randomly selected sampleto determine relationship between average assignmentand test scores in statistics. Distribution for the data ispresented in the table below. Assuming the data arenormally distributed,

1) Plot a scatter diagram to represent the followingpair of scores of the two variables.

2) Calculate an appropriate correlation coefficient,3) Describe the nature of relationship between the two

variables, and4) Test the hypothesis on the relationship

at .01 level of significance.

Exercise

Page 21: Pearson Correlation Simple Linear Regression Chi-Square

2) Explain the following concept. You may usegraphs to illustrate each concept

a) Perfect positive linear correlationb) Perfect negative linear correlationc) Strong positive linear correlationd) Strong negative linear correlatione) Weak positive linear correlationf) Weak negative linear correlation g) No linear correlation

Page 22: Pearson Correlation Simple Linear Regression Chi-Square

3) For a sample data set, the linear correlation coefficient r has a positive value.Which of the following is true about the slope b of the regression line estimated for the same sample data?

a) The value of b will be positiveb) The value of b will be negativec) The value of b can be positive or negative

n22

Page 23: Pearson Correlation Simple Linear Regression Chi-Square

� 3) The data on ages (in years) and prices (in hundred of dollars for eight cars of a specific model) are shown below:Age: 8 3 6 9 2 5 6 3

Prices: 18 94 50 21 145 42 36 99

a) Do you expect the ages and prices of cars to bepositively or negatively related? Explain.b) Calculate the linear correlation coefficient.c) Test at the 5% significance level whether ρ is

negative

Page 24: Pearson Correlation Simple Linear Regression Chi-Square

4) The following table lists the Advertising and Marketing scores for 7 students in a statistics class.� Advertising score: 79 95 81 66 87 94 59� Marketing score: 85 97 78 76 94 84 67

a) Do you expect the Advertising and Marketing scores to bepositively or negatively correlated?

b) Plot a scatter diagram. By looking at the scatter diagram, do you expect the correlation coefficient between these 2 variables to be close to zero, 1, or -1.c) Find the correlation coefficient. Is the value of r consistent with what you expected in parts a and b?d) Using the 1% significance level, test whether the linear correlation coefficient is Positive

Page 25: Pearson Correlation Simple Linear Regression Chi-Square

X= PriceY= Sale

5)

Page 26: Pearson Correlation Simple Linear Regression Chi-Square

n26

Simple Linear Regression

Page 27: Pearson Correlation Simple Linear Regression Chi-Square

IntroductionPurpose� To determine relationship between IV and DV� To predict value of the dependent variable (Y)

based on value of independent variable (X)� To assess how well the dependent variable can

be explained by knowing the value of the independent variable

Requirement� Scales of measurement for variables:DV -interval or ratioIV -interval or ratio

Page 28: Pearson Correlation Simple Linear Regression Chi-Square

Concepts of

Simple Linear Regression

n28

Page 29: Pearson Correlation Simple Linear Regression Chi-Square

Descriptive

n29

Page 30: Pearson Correlation Simple Linear Regression Chi-Square

Bases for Best Fit Line� Use Scatter Plot to identify the line of best fit� The line is also called the least squares

regression line� The purpose of this line is to show the overall

trend or pattern in the data and to allow the reader to make predictions about future trends in the data.

n30

Page 31: Pearson Correlation Simple Linear Regression Chi-Square

Prediction equation

Regression Model involves two statistics andparameters, namely: Intercept and slope ofthe regression line.

Page 32: Pearson Correlation Simple Linear Regression Chi-Square

Inferential

n32

Page 33: Pearson Correlation Simple Linear Regression Chi-Square

�Evaluation of the regression model(prediction equation) and slope of theregression line involve testing ofhypothesis.

�The testing of hypothesis for predictionequation involves the F test while thetesting of slop entails t test

n33

Page 34: Pearson Correlation Simple Linear Regression Chi-Square

n34

Page 35: Pearson Correlation Simple Linear Regression Chi-Square

� The purpose of evaluating the regression model or the prediction equation is to determine whether the model really fits the data.

Page 36: Pearson Correlation Simple Linear Regression Chi-Square

n36

Page 37: Pearson Correlation Simple Linear Regression Chi-Square

The purpose to test hypothesis regarding theregression slope is to determine the significance ofthe relationship between IV and DV.

Page 38: Pearson Correlation Simple Linear Regression Chi-Square

ExampleData were collected from a randomly selectedsample to determine relationship between averageassignment scores and test scores in statistics.Distribution for the data is presented in the tablebelow.1.Calculate b1and b0 and derive the prediction equation2.Test the hypothesis for the regression model at α= .053.What the values of coefficient of determination ( R2) and multiple correlation coefficient ( r). Interpret the two values.4.Test hypothesis for the slope at .05

Page 39: Pearson Correlation Simple Linear Regression Chi-Square

Chi-SquareTest for Independence

Page 40: Pearson Correlation Simple Linear Regression Chi-Square

Chi-square test for independence

� Explore the association between twocategorical variables. Each of these variablescan have two or more categories.

n40

Independent variable(Nominal/ Ordinal)

Dependent variable(Nominal/ Ordinal)

Page 41: Pearson Correlation Simple Linear Regression Chi-Square

Example

� To explore the association between Gender ( Male / Female) and Smoking Behaviour ( Smoker/Non-Smoker).

� IV: Gender ( Male/ Female)DV: Smoking Behaviour ( Smoker / Non Smoker)

n41

Page 42: Pearson Correlation Simple Linear Regression Chi-Square

� Research questions:There are a variety of ways questions can bephrased:

1. Is there an association between gender andsmoking behaviour?

2. Are males more likely to be smokers thanfemales?

3. Is the proportion of males that smoke thesame as the proportion of females?

n42

Page 43: Pearson Correlation Simple Linear Regression Chi-Square

Minimum Expected Cell Frequency Assumption

The lowest expected frequency in any cell should be5 or more. Some authors suggest less stringentcriteria: at least 80 per cent of cells should haveexpected frequencies of 5 or more.

We could not use"# for the following case:

n43

OBSERVED EXPECTEDAGREE 247 255

DISAGREE 6 4

Page 44: Pearson Correlation Simple Linear Regression Chi-Square

What to Expect?H

ypot

hesi

s Te

st

StateH0 and HA

Calculate $2

1

3

Critical Value

2

Decision Conclusion

54

Measures of association

●Phi●Contingency● Cramer’s V

Page 45: Pearson Correlation Simple Linear Regression Chi-Square

Step in Testing Hypothesis

1. State the null and alternative hypothesesHO: DV is independent of IVHA: DV is dependent on IV

2. Calculate the test statistic$2 value� Contingency table –two variables� Calculation based on:

O-Observed frequencyE-Expected frequency

Page 46: Pearson Correlation Simple Linear Regression Chi-Square

� Summary of the table to calculate the chi-square statistics

$2

Page 47: Pearson Correlation Simple Linear Regression Chi-Square

3. Determine critical value•α•df= (R –1) (C –1)

4.Make your decision5.Make conclusion

Page 48: Pearson Correlation Simple Linear Regression Chi-Square

Effect Size� To determine the strength and magnitude of the

association between two variables.Effect size statistics: � Phi coefficient (2 by 2 tables)� Cramer's V (Tables larger than 2 by 2)� Contingency coefficient ( Tables larger than 2 by 2)

n48

Contingency

Page 49: Pearson Correlation Simple Linear Regression Chi-Square

n49

Page 50: Pearson Correlation Simple Linear Regression Chi-Square

Example 1:� A study was conducted to test the association

between Firm size and cloud computing adoption.Data collected from a randomly selected samplefollow.

� 1. Test the hypothesis on the association between thetwo variables at .01 level of significance.

� 2. Calculate and describe an appropriate measure ofassociation between the two variables.

n50

Page 51: Pearson Correlation Simple Linear Regression Chi-Square

1. Hypotheses testinga. State H0 and HA

� H0: Cloud computing adoption is independent of Firm size� HA: Cloud computing adoption is dependent on Firm size

b. Test statisticCalculate expected value for each cell:

Page 52: Pearson Correlation Simple Linear Regression Chi-Square

n52

Group

Page 53: Pearson Correlation Simple Linear Regression Chi-Square
Page 54: Pearson Correlation Simple Linear Regression Chi-Square
Page 55: Pearson Correlation Simple Linear Regression Chi-Square

a. State H0 and HA

Page 56: Pearson Correlation Simple Linear Regression Chi-Square
Page 57: Pearson Correlation Simple Linear Regression Chi-Square
Page 58: Pearson Correlation Simple Linear Regression Chi-Square

n58