Transcript
  • 8/2/2019 Correlation n Regression

    1/37

    17-1

  • 8/2/2019 Correlation n Regression

    2/37

    17-2

    CORRELATION ANALYSISAND REGRESSION

    ANALYSIS

  • 8/2/2019 Correlation n Regression

    3/37

    17-3

    Correlation

    Correlation A measure of association between

    two numerical variables.

    Example (positive correlation)Typically, in the summer as the

    temperature increases people arethirstier.

  • 8/2/2019 Correlation n Regression

    4/37

    17-4

    Scatter Diagram

    Scatter diagrams provide therelationship between two variables ina graphical form

    The diagram summarizes the natureof relationship between two variables

    Whether the relationship is positiveor negative

    The diagram also explains themagnitude of the relationship

  • 8/2/2019 Correlation n Regression

    5/37

    17-5

    Scatter Diagrams with varied rvalues

    r2 = 1, r2 = 1,

    r2 = .81, r2 = 0,

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    r = +1 r = -1

    r = +0.

    9 r = 0

  • 8/2/2019 Correlation n Regression

    6/37

    17-6

    Specific Example

    For sevenrandomsummer days, a

    person recordedthe temperatureand their waterconsumption,

    during a three-hour periodspent outside.

    Temperature(F)

    Water

    Consumption

    (ounces)

    75 16

    83 20

    85 25

    85 27

    92 32

    97 48

    99 48

  • 8/2/2019 Correlation n Regression

    7/37

    17-7

    How would you describe the graph?

  • 8/2/2019 Correlation n Regression

    8/37

    17-8

    How strong is the linearrelationship?

  • 8/2/2019 Correlation n Regression

    9/37

    17-9

    Correlation Analysis Correlation Analysis is statistical

    technique used to measure themagnitude of linear relationshipbetween two variables

    Correlation can be used along withregression analysis to determine thenature of the relationship betweenvariables

    The prominent correlation coefficientsare

    1.The Pearson product moment

    correlation coefficient

  • 8/2/2019 Correlation n Regression

    10/37

    17-10

    Measuring the Relationship

    Pearsons SampleCorrelation

    Coefficient, r

    measures the direction and

    the strength of the linearassociation between twonumerical paired variables.

  • 8/2/2019 Correlation n Regression

    11/37

    17-11

    Direction of Association

    Positive Correlation NegativeCorrelation

  • 8/2/2019 Correlation n Regression

    12/37

    17-12

    Strength of LinearAssociation

    rvalue Interpretation1 perfect positive linear

    relationship

    0 no linear relationship-1 perfect negative linear

    relationship

    17 13

  • 8/2/2019 Correlation n Regression

    13/37

    17-13

    Strength of LinearAssociation

    17 14

  • 8/2/2019 Correlation n Regression

    14/37

    17-14

    Other Strengths ofAssociation

    r value Interpretation0.9 strong association

    0.5 moderate association

    0.25 weak association

    17 15

  • 8/2/2019 Correlation n Regression

    15/37

    17-15

    Other Strengths ofAssociation

    17 16

  • 8/2/2019 Correlation n Regression

    16/37

    17-16

    Product Moment CorrelationThe product moment correlation,

    r, summarizes the strength ofassociation between two metric(interval or ratio scaled) variables,say X and Y.

    As it was originally proposed by Karl

    Pearson, it is also known as thePearson correlation coefficient. It isalso referred to as simple correlation,bivariate correlation, or merely the

    correlation coefficient.

    17 17

  • 8/2/2019 Correlation n Regression

    17/37

    17-17

    Product Moment Correlation

    From a sample ofn observations,Xand Y,the product moment correlation, r, can becalculated as:

    rvaries between -1.0 and +1.0.

    ( ) ( )

    ( ) ( )

    1

    2 2

    1 1

    n

    i i

    i

    n n

    i i

    i i

    X X Y Y

    r

    X X Y Y

    =

    = =

    =

    17 18

  • 8/2/2019 Correlation n Regression

    18/37

    17-18

    Ad Spending and Corresponding Sales ofRoyal Products

    Company Adver t is ingE xp(X )

    S ale s(Y )

    1 6 10

    2 9 12

    3 8 12

    4 3 4

    5 1 0 1 2

    6 4 6

    7 5 8

    8 2 2

    9 1 1 1 8

    1 0 9 9

    1 1 1 0 1 7

    1 2 2 2

    Ad Ex(in Crores)Sales(inThousands)

    17 19

  • 8/2/2019 Correlation n Regression

    19/37

    17-19

    Product Moment CorrelationThe correlation coefficient may be calculated as follows:

    X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333

    Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583

    = (10 -9.33)(6-6.58) + (12-9.33)(9-6.58)+ (12-9.33)(8-6.58) + (4-9.33)(3-6.58)+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)

    + (17-9.33)(10-6.58) + (2-9.33)(2-6.58)= -0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214 - 0.7986 + 26.2314 + 33.5714= 179.6668

    17-20

  • 8/2/2019 Correlation n Regression

    20/37

    17-20

    Product Moment Correlation1

    = (10-9.33)2 + (12-9.33)2 + (12-9.33)2 + (4-9.33)2

    + (12-9.33)

    2

    + (6-9.33)

    2

    + (8-9.33)

    2

    + (2-9.33)

    2

    + (18-9.33)2 + (9-9.33)2 + (17-9.33)2 + (2-9.33)2

    = 0.4489 + 7.1289 + 7.1289 + 28.4089+ 7.1289+ 11.0889 + 1.7689 + 53.7289+ 75.1689 + 0.1089 + 58.8289 + 53.7289= 304.6668

    = (6-6.58)2 + (9-6.58)2 + (8-6.58)2 + (3-6.58)2+ (10-6.58)2+ (4-6.58)2 + (5-6.58)2 + (2-6.58)2

    + (11-6.58)2 + (9-6.58)2 + (10-6.58)2 + (2-6.58)2

    = 0.3364 + 5.8564 + 2.0164 + 12.8164

    + 11.6964 + 6.6564 + 2.4964 + 20.9764+ 19.5364 + 5.8564 + 11.6964 + 20.9764= 120.9168

    Thus, r= 179.6668

    (304.6668) (120.9168)= 0.9361

    17-21

  • 8/2/2019 Correlation n Regression

    21/37

    17 21

    Product Moment CorrelationThe correlation coefficient may be calculated as follows:

    X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333

    Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583

    = (10 -9.33)(6-6.58) + (12-9.33)(9-6.58)+ (12-9.33)(8-6.58) + (4-9.33)(3-6.58)+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)

    + (17-9.33)(10-6.58) + (2-9.33)(2-6.58)= -0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214 - 0.7986 + 26.2314 + 33.5714= 179.6668

    17-22

  • 8/2/2019 Correlation n Regression

    22/37

    17 22

    Rank correlation

    Researchers often face situationswhere they have to take decisionsbased on data measured on ordinal

    scale scales in such casesSpearmans rank correlation isappropriate to relationship between

    variables.It can be calculated using following

    formula

    rs = 1 (( 6 D 2 )/( N(N2 -1))

    17-23

    h ki f l i i

  • 8/2/2019 Correlation n Regression

    23/37

    17 23

    The ranking of televisionModels

    Television Models Existing System New system

    A 3 1

    B 5 5

    C 10 9

    D 2 3

    E 7 2F 6 4

    G 4 6

    H 1 7

    I 8 10J 9 8

    17-24

  • 8/2/2019 Correlation n Regression

    24/37

    17 24

    Calculation of Rank correlationcoefficient

    Television

    Models

    Existing

    System(X)

    New

    system(Y)

    D =(R1 - R2 ) D2

    A 3 12 4

    B 5 50 0

    C 10 91 1D 2 3-1 1

    E 7 25 25

    F 6 42 4

    G 4 6-2 4

    H 1 7-6 36

    I 8 10-2 4

    J 9 81 1

    17-25

  • 8/2/2019 Correlation n Regression

    25/37

    17 25

    rs = 1 (( 6 D2 )/( N(N2 -1))

    = 1-((6X80) /(10(100-1)))

    = 1-(480/990)

    = 1-0.48

    = 0.52

    This indicates that there is a positivecorrelation between two variables.This means the both the systems are

    giving similar results

    17-26

  • 8/2/2019 Correlation n Regression

    26/37

    6

    Regression

    Regression

    Specific statistical methods for

    finding the line of best fit for oneresponse (dependent) numericalvariable based on one or more

    explanatory (independent)variables.

    17-27

  • 8/2/2019 Correlation n Regression

    27/37

    Regression: 3 MainPurposes

    To describe (or model)

    To predict (or estimate)

    To control (or administer)

    17-28

  • 8/2/2019 Correlation n Regression

    28/37

    Regression AnalysisRegression analysis examines

    associative relationships between ametric dependent variable and oneor more independent variables in thefollowing ways:

    Determine whether the independentvariables explain a significantvariation in the dependent variable

    Determine how much of thevariation in the dependent variablecan be explained by theindependent variables: strength ofthe relationship.

    Predict the values of the dependent

    17-29

  • 8/2/2019 Correlation n Regression

    29/37

    Example

    Plan an outdoor party.

    Estimate number of softdrinks to buy per person, basedon how hot the weather is.

    Use Temperature/Water dataand regression.

    17-30

  • 8/2/2019 Correlation n Regression

    30/37

    Real Life Applications

    Estimating Seasonal Sales forDepartment Stores (Periodic)

    17-31

  • 8/2/2019 Correlation n Regression

    31/37

    Real Life Applications

    Predicting Student Grades Based onTime Spent Studying

    17-32

  • 8/2/2019 Correlation n Regression

    32/37

    Practice Problems

    Can the number of pointsscored in a basketball game bepredicted by

    The time a player plays inthe game?

    By the players height?

    17-33

  • 8/2/2019 Correlation n Regression

    33/37

    Types of Regression Models

    Positive Linear Relationship

    Negative Linear Relationship

    Relationship NOT Linear

    No Relationship

    17-34

  • 8/2/2019 Correlation n Regression

    34/37

    Least square method

    The equation for regression line assumedby Least Squares method is

    Y=a+bx+ei Where ei =Yi-i Where Y is the dependent variable X is the independent variable a is the Y-intercept

    b is the slope of the line b=( (n (XY)-( X Y))/ ((n (X 2)-( X) 2) a=Y-bX

    17-35

    Calculations for determining

  • 8/2/2019 Correlation n Regression

    35/37

    Calculations for determiningconstants a and b

    Man Hours(X) Productivity in

    units(Y)

    XY X2

    3.6 9.3 33.48 12.96

    4.8 10.2 48.96 23.04

    2.4 9.7 23.28 5.76

    7.2 11.5 82.8 51.84

    6.9 12 82.8 47.61

    8.4 14.2 119.28 70.56

    10.7 18.6 199.02 114.49

    11.2 28.4 318.08 125.44

    6.1 13.2 80.52 37.21

    7.9 10.8 85.32 62.41

    9.5 22.7 215.65 90.25

    5.4 12.3 66.42 29.16

    X=84.1 Y=172.9 XY=1355.61 X2

    17-36

  • 8/2/2019 Correlation n Regression

    36/37

    b=1.768

    a=2.01

    Y=2.01+1.768X

    17-37

  • 8/2/2019 Correlation n Regression

    37/37

    The Strength of Association R2

    R2 = ( Explained Variance) / ( TotalVariance)

    Total Variance = (ExplainedVariance)+

    (UnexplainedVariance)

    Explained Variance=(TotalVariance )

    (Unexplainedi


Top Related