chapter 5_ handouts.pdf

Upload: parklong16

Post on 04-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    1/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    1

    CHAPTER 5

    CORRELATION AND REGRESSION

    Introduction

    Correlation and Regression

    Scatter Plot/Diagram

    Coefficient of Correlation

    Simple Linear Regression

    [email protected]

    1

    Learning objectives

    Explain the concept of correlation Calculate Pearsons correlation coefficientand

    interpret the results

    Calculate Spearmans rank correlation forqualitative and quantitative data and interpret theresults

    Determine the regression equation for a set of data andinterpret the equation

    Use the regression equation to forecast

    [email protected]

    2

    Introduction

    Correlation: Do you have a relationship?

    (Between two quantitative variables, x&y)

    If you have a relationship:

    1) What is the direction? (+ or -)

    2) What is the strength (r: -1 to +1)

    #Correlationmeasures LINEARrelationship.

    If you have a significant correlation:

    How well can you predicta subjects y-score ifyou know their x-score?

    [email protected]

    3

    Correlation & Regression Regressionand correlationare two concepts used to

    describe the relationshipbetween variables.

    Correlationis a statistical method used todetermineif a relationship between variablesexists.

    Regressionis the statistical method used todescribethe nature of the relationship betweenvariables - that is, positive or negative, linear ornonlinear.

    [email protected]

    4

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    2/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    2

    Independent and Dependent Variable

    In this chapter, we want to study the relationshipbetween 2 variables only.

    Independent variable x

    Dependent variable - y

    For example:

    Expenditure (x) andRevenue (y)

    Price (x) andsales (y)

    Number of days absent (x) andCGPA (y)

    Age of a person (x) andhis/her blood pressure (y)

    [email protected]

    5

    Independent and Dependent Variable

    Also called predictoror

    explanatoryor manipulatedvariable the variable in regression that can

    be controlled or manipulated

    Independentvariable (x)

    Also called the responsevariable the variable that cannot be

    controlled or manipulated

    Dependentvariable (y)

    [email protected]

    6

    Dependent(x) Vs. Independent(y)

    Intentionally manipulated

    Controlled

    Vary at known rate

    Cause

    Intentionally left alone

    Measured

    Vary at unknown rate

    Effect

    7

    [email protected]

    Example: What affects a students

    arrival to class?Variables: Type of School

    FSPPP, Business School, FSKM

    Type of Student Gender? CGPA?

    Class Time Morning, Afternoon, Evening

    Mode of Transportation Motorcycle, Car, UiTM bus

    8

    [email protected]

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    3/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    3

    Scatter Plot (scatter diagram) A scatter plot is used to showthe relationship

    between two variables.

    The scatter plot is a visual way to describe the nature ofthe relationshipbetween the independentvariable (x) and the dependent variable (y).

    Interpreting scatter plots:

    Positivelinear relationship

    Negativelinear relationship

    Nonlinearrelationship

    No relationship

    [email protected]

    9

    Scatter Plot Examples

    y

    x

    y

    x

    y

    y

    x

    x

    Linear relationships Nonlinear (Curvilinear)

    relationships

    Positive

    Negative

    10

    [email protected]

    Scatter Plot Examples

    y

    x

    y

    x

    y

    y

    x

    x

    Strong relationships Weak relationships

    (continued)

    11

    [email protected]

    Scatter Plot Examples

    y

    x

    y

    x

    No relationship

    (continued)

    12

    [email protected]

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    4/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    4

    Example 1 (pg. 134) Draw a scatter diagram for the following data and state

    the type of relationship between the variables.

    [email protected]

    13

    x 1 3 5 7 9 13 17

    y 0 5 11 14 19 22 30

    Correlation [email protected]

    14

    Correlation coefficient measures the strength and directionof

    a LINEAR relationship between a pair of random variables.

    The POPULATIONcorrelation coefficient (rho) measures the

    strength of the association between the variables.

    The samplecorrelation coefficient ror s is an estimate of and is used to measure the strength of the linear relationship

    in the sample observations.

    Correlation Coefficient r or s indicates

    strengthof relationship (strong, weak, or none)

    directionof relationship

    positive (direct) variables move in same direction

    negative (inverse) variables move in oppositedirections

    rranges in value from1.0 to +1.0.

    Very Strong No Strong VeryStrong Relationship Strong

    -1.0 -0.8 -0.5 0.0 +0.5 +0.8 +1.0

    Moderate Weak Weak ModerateNegative Positive

    15

    [email protected]

    -vePerfect

    +vePerfect

    Do Variables Relate to One Another?

    Is teachers pay related to performance?

    Is exercise related to illness?

    Is CO2 related to global warming?

    Is TV viewing related to shoe size?

    Is shoe size related to height?

    Is height related to IQ?

    Is cigarettes smoked per day related to

    lung capacity?

    Positive

    Negative

    PositiveZero

    16

    [email protected]

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    5/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    5

    Positive correlation

    [email protected]

    17

    Two variables move in the same direction

    Negative correlation

    [email protected]

    18

    Two variables tend to go in the opposite direction

    [email protected]

    19

    Methods for CalculatingCorrelation Coefficient, rors

    Pearson Product-Moment Correlation

    Coefficient

    Spearman RankCorrelation Coefficient

    Pearson Coefficient of Correlation

    Both variables must be quantitative and normallydistributed.

    Calculation for r:

    [email protected]

    20

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    6/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    6

    Example 2

    Refer to Example 1. Compute Pearson coefficientof correlation and interpret the result.

    [email protected]

    21

    The Spearman rank correlation coefficient

    Spearmans rank correlation coefficient is a measure of associationbetween two variables that are at least of ordinal scale (suitable forqualitative data).

    Can also be applied to quantitative databut the variables must firstsbe rankedand then only it is calculated based on these rankings.

    where:

    d = difference between two ranks

    n = number of pairs of observations

    NOTE: Be careful with tied observations22

    [email protected]

    How to calculate Spearmans rank

    correlation coefficient?

    1. List each set of scores in a column.

    2. Rank the two sets of scores.

    3. Place the appropriate rank beside each score.

    4. Head a column dand determine the differencein rank for

    each pair of scores.

    (Note:Sum of the dcolumn should always be 0)

    5. Square each number in the dcolumn and sum the

    values (d 2).

    6. Use the formula to calculate the correlation coefficient.

    [email protected]

    23

    Refer Example 5 pg. 140

    StudentSubject d d

    2

    Statistics Computer

    A 1 3

    B 2 1

    C 3 4

    D 4 2

    E 5 5

    [email protected]

    24

    Five students A, B, C, D, E are rankedin two subjects, statistics and

    computer programming with the following results.

    Calculate the Spearmans rank correlation coefficient.

    )1(61

    2

    2

    nnd

    s

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    7/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    7

    Refer Example 6 pg. 141x y Rank of x,

    Rx

    Rank of y,Ry

    d=Rx-Ry d2

    6.0 80

    6.2 80

    6.5 78

    6.8 75

    7.0 70

    7.2 60

    7.5 60

    7.8 55

    8.0 50

    8.2 48

    8.4 45

    8.7 40

    [email protected]

    25

    The Regression Line Regression indicates the degree to which the variation in one

    variable X, is related to or can be explained by the variation in

    another variable Y

    Once you know there is a significant linear correlation, youcan write an equation describing the relationship between

    thexandyvariables.

    This equation is called the line of regression or least squares

    line.

    The equation of a line may be written as:

    where bis the slope of the line and ais the y-intercept.

    [email protected]

    26

    Regression line Creates a line of best fit running through the data

    Analyze the relationship between the two quantitativevariables,Xand Y

    a-intercept: if x= 0 is in the range, then ais the meanof the distribution

    of the response y, when x= 0; if x= 0 is not in the range, then ahas no practical

    interpretation

    b-slope: change in the mean of the distribution of the response

    produced by a unit change inx

    [email protected]

    Dependentvariable

    Independentvariable

    27

    The Least Squares Regression Line

    The values of a and bin the regression line y = a+ bxcan be calculated by using the least squares method (ormethod of least squares), given by the following formula:

    [email protected]

    28

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    8/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    8

    x y8 78

    2 92

    5 90

    12 58

    15 43

    9 746 81

    Absences

    Final

    Grade

    Example 3: Application

    959085807570656055

    4540

    50

    0 2 4 6 8 10 12 14 16

    FinalGrade

    X

    Absences

    29

    [email protected]

    Calculate aand b.

    Write the equation of the

    line of regression with

    x= number of absences

    and y= final grade.

    The line of regression is:

    60848464

    8100

    3364

    1849

    54766561

    624184

    450

    696

    645

    666486

    57 516 3751 579 39898

    1 8 78

    2 2 92

    3 5 90

    4 12 585 15 43

    6 9 747 6 81

    644

    25

    144

    225

    8136

    xy x2 y2x y

    30

    [email protected]

    0 2 4 6 8 10 12 14 16

    40455055606570758085

    9095

    Absences

    Final

    Grade

    The line of regression is: y = -3.924x + 105.667

    Note that the point = (8.143, 73.714) is on the line.

    The Line of Regression31

    [email protected]

    The regression line can be used to predict values of y

    for values of xfalling within the range of the data.

    The regression equation for number of times absent and final

    grade is:

    Use this equation to predictthe expected grade for a student with

    (a) 3 absences (b) 12 absences

    Predicting yValues

    (a) y =3.924(3) + 105.667 = 93.895

    (b) y =3.924(12) + 105.667 = 58.579

    y =

    3.924x

    + 105.667

    32

    [email protected]

  • 8/14/2019 CHAPTER 5_ handouts.pdf

    9/9

    QMT412 Pn. Sanizah's Notes 02/05/2013

    9

    Coefficient of Determination

    The coefficient of determination, r2,measures thestrength of the association and is the ratio of explained

    variation in yto the total variation in y.

    Interpretation :proportion of the variation iny that is explained by the variation in x

    [email protected]

    33

    The correlation coefficient of number of times absent and final

    grade is r=0.975. The coefficient of determination is

    r2 = (0.975)2 = 0.9506.

    Interpretation:About 95.06%of the variation in final grades can be

    explained by the number of times a student is absent.

    Note:The other 4.94%is unexplained and can be due to sampling

    error or other variables such as intelligence, amount of time

    studied, etc.

    Recall Example 3

    34

    [email protected]