correlations, simple regression

Upload: yasheshgaglani

Post on 07-Apr-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Correlations, Simple Regression

    1/43

    (Not) Relationships amongVariables

    Descriptive stats (e.g., mean, median,mode, standard deviation)

    describe a sample of data

    z-test &/or t-test for a single populationparameter (e.g., mean)

    infer the true value of a single variable ex: mean # of random digits that people can

    memorize

  • 8/4/2019 Correlations, Simple Regression

    2/43

    Relationships among Variables

    But relationships among more than one variableare the crucial feature of almost all scientificresearch.

    Examples: How does the perception of a stimulus vary with the

    physical intensity of that stimulus?

    How does the attitude towards the President vary with

    the socio-economic properties of the surveyrespondent?

    How does the performance on a mental task vary withage?

  • 8/4/2019 Correlations, Simple Regression

    3/43

    Relationships among Variables

    More Examples:

    How does depression vary with number of traumaticexperiences?

    How does undergraduate drinking vary with performance inquantitative courses?

    How does memory performance vary with attention span?

    etc...

    Weve already learned a few ways to analyzerelationships among 2 variables.

  • 8/4/2019 Correlations, Simple Regression

    4/43

    Relationships among TwoVariables: Chi-Square

    Chi-Square test of independence (2-waycontingency table) compare observed cell frequencies to the cell

    frequencies youd expect if the two variables areindependent.

    ex: X=geographical region: West coast, Midwest, East

    coast

    Y=favorite color: red, blue, green

    Note: both variables are categorical

  • 8/4/2019 Correlations, Simple Regression

    5/43

    Relationships among TwoVariables: Chi-Square

    Observed frequencies:

    Expected frequencies:

    West

    Coast Midwest

    East

    Coast

    Red 49 30 18

    Blue 52 32 20

    Green 130 62 107

    West

    Coast Midwest

    East

    Coast total

    Red 49 30 18 97

    Blue 52 32 20 104

    Green 130 62 107 299

    total 231 124 145 500

    West

    Coast Midwest

    East

    Coast total

    Red 97

    Blue 104

    Green 299

    total 231 124 145 500

    (row total)(column total

    grand total

    West

    Coast Midwest

    East

    Coast total

    Red 44.814 97

    Blue 104

    Green 299

    total 231 124 145 500

    West

    Coast Midwest

    East

    Coast total

    Red 44.814 24.06 28.13 97

    Blue 48.05 25.79 30.16 104

    Green 138.14 74.15 86.71 299

    total 231 124 145 500

  • 8/4/2019 Correlations, Simple Regression

    6/43

    Relationships among TwoVariables: Chi-Square

    Observed frequencies:

    Expected frequencies:

    West

    Coast Midwest

    East

    Coast total

    Red 49 30 18 97

    Blue 52 32 20 104

    Green 130 62 107 299

    total 231 124 145 500

    West

    Coast Midwest

    East

    Coast total

    Red 44.814 24.06 28.13 97

    Blue 48.05 25.79 30.16 104

    Green 138.14 74.15 86.71 299

    total 231 124 145 500

    2

    (observed expected) 2

    expectedall cells

    17.97

    if this exceeds

    critical value,

    reject H0 that the2 variables are

    independent

    (unrelated)

  • 8/4/2019 Correlations, Simple Regression

    7/43

    Relationships among TwoVariables: z, t tests

    z-test &/or t-test for difference ofpopulation means

    compare values of one variable (Y) for 2 differentlevels/groups of another variable (X)

    ex:

    X=age: young people vs. old people

    Y=# random digits can memorize

    Q: Is the mean # digits the same for the 2 agegroups?

  • 8/4/2019 Correlations, Simple Regression

    8/43

    Relationships among TwoVariables: ANOVA

    ANOVA

    compare values of one variable (Y) for 3+ different

    levels/groups of another variable (X) ex:

    X=age: young people, middle-aged, old people

    Y=# random digits can memorize

    Q: Is the mean # digits the same for all 3 agegroups?

  • 8/4/2019 Correlations, Simple Regression

    9/43

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    # Digits Memorized

    young

    old

    Relationships among TwoVariables: z, t & ANOVA

    NOTE: for z/t tests for differences, and for ANOVA, there are a

    small number of possible values for one of the variables (X)z, t ANOVA

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    # Digits Memorized

    young

    middle

    old

  • 8/4/2019 Correlations, Simple Regression

    10/43

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    # Digits Memorized

    young

    old

    0

    1

    2

    3

    4

    5

    6

    7

    8

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    # Digits Memorized

    young

    middle

    old

    Relationships among TwoVariables: z, t & ANOVA

    NOTE: for z/t tests for differences, and for ANOVA, there are a

    small number of possible values for one of the variables (X)

    QuickTime and a

    decompressor

    are needed to see this picture.

    z, t

    QuickTime and adecompressor

    are needed to s ee this picture.

    ANOVA

  • 8/4/2019 Correlations, Simple Regression

    11/43

    Relationships among TwoVariables: many values of X?

    What about when there are many possible values of BOTH

    variables? Maybe theyre even continuous (rather than discrete)?

    QuickTime and a

    decompressorare needed to see t his picture.

    Correlation, andSimple Linear

    Regression will be

    used to analyzerelationship among

    two such variables

    (scatter plot)

  • 8/4/2019 Correlations, Simple Regression

    12/43

    Correlation: Scatter Plots

    QuickTime and adecompressor

    are needed to see this picture.Does it look like there is a relationship?

  • 8/4/2019 Correlations, Simple Regression

    13/43

    Correlation

    measures the direction and strength of alinear relationship between two variables

    that is, it answers in a general way thequestion: as the values of one variable

    change, how do the corresponding values

    of the other variable change?

  • 8/4/2019 Correlations, Simple Regression

    14/43

    Linear Relationship

    linear relationship:

    y=a +bx (straight line)

    QuickTime and adecompressor

    are needed to s ee this picture.

    QuickTime and a

    decompressorare needed to see this p icture.

    QuickTime and adecompressor

    are needed to see this picture.

    LinearNot (strictly) Linear

  • 8/4/2019 Correlations, Simple Regression

    15/43

    Correlation Coefficient: r

    sign: direction of relationship

    magnitude(number): strength of relationship

    -1 r 1 r=0 is no linear relationship

    r=-1 is perfect negative correlation

    r=1 is perfect positive correlation

    Notes: Symmetric Measure (You can exchange X and Y and

    get the same value)

    Measures linear relationship only

  • 8/4/2019 Correlations, Simple Regression

    16/43

    QuickTime an d a

    decompressorare needed to see this picture.

    Correlation Coefficient: r

    Formula:

    alt. formula (ALEKS):

    r1

    n 1xi x

    sx

    yi ysy

    standardized

    values

    r

    x iy i nx yi1

    n

    (n 1)sxsy

  • 8/4/2019 Correlations, Simple Regression

    17/43

    QuickTime and a

    decompressorare needed to see this p icture.

    Correlation: Examples

    QuickTime and a

    decompressorare needed to see this picture.

    Population: undergraduates

  • 8/4/2019 Correlations, Simple Regression

    18/43

    Correlation: Examples

    QuickTime and a

    decompressorare needed to see this picture.

    Population: undergraduates

  • 8/4/2019 Correlations, Simple Regression

    19/43

    Correlation: Examples

    Population: undergraduates

  • 8/4/2019 Correlations, Simple Regression

    20/43

    Correlation: Examples

    Others?

  • 8/4/2019 Correlations, Simple Regression

    21/43

    Correlation: Interpretation

    Correlation Causation!

  • 8/4/2019 Correlations, Simple Regression

    22/43

  • 8/4/2019 Correlations, Simple Regression

    23/43

  • 8/4/2019 Correlations, Simple Regression

    24/43

    Correlation: Interpretation

    Correlation Causation!

    When 2 variables are correlated, the causality maybe:

    X --> Y X X&Y (lurking third variable)

    or a combination of the above

    Examples: ice cream & murder, violence & videogames, SAT verbal & math, booze & GPA

    Inferring causation requires consideration of: howdata gathered (e.g., experiment vs. observation),

    other relevant knowledge, logic...

  • 8/4/2019 Correlations, Simple Regression

    25/43

    Simple Linear Regression

    PREDICTING one variable (Y) fromanother (X)

    No longer symmetric like Correlation

    One variable is used to explain another variable

    X VariableIndependent VariableExplaining VariableExogenous Variable

    Predictor Variable

    Y VariableDependent VariableResponse Variable

    Endogenous Variable

    Criterion Variable

  • 8/4/2019 Correlations, Simple Regression

    26/43

    Simple Linear Regression

    idea: find a line (linear function) that bestfits the scattered data points

    this will let us characterize the relationshipbetween X & Y, and predict new values ofY for a given X value.

  • 8/4/2019 Correlations, Simple Regression

    27/43

  • 8/4/2019 Correlations, Simple Regression

    28/43

  • 8/4/2019 Correlations, Simple Regression

    29/43

    (0,a)

    b

    Intercept

    Slope

    bX+a

    X

    Reminder: (Simple) Linear Function Y=a+bX

    We are interested in this to model the relationship between anindependent variable X and a dependent variable Y

    Y

  • 8/4/2019 Correlations, Simple Regression

    30/43

    1

    slope:b

    intercept:a

    bXaY

    :spredictionerrorlesshadweIf

    X

    Y

    Simple Linear Regression

    all data points would

    fall right on the line

  • 8/4/2019 Correlations, Simple Regression

    31/43

    X

    YA guess at the location of the regression line

  • 8/4/2019 Correlations, Simple Regression

    32/43

    X

    Y

    Another guess at the location of the regression line

    (same slope, different intercept)

  • 8/4/2019 Correlations, Simple Regression

    33/43

    X

    YInitial guess at the location of the regression line

  • 8/4/2019 Correlations, Simple Regression

    34/43

    X

    Y

    Another guess at the location of the regression line

    (same intercept, different slope)

  • 8/4/2019 Correlations, Simple Regression

    35/43

    X

    YInitial guess at the location of the regression line

  • 8/4/2019 Correlations, Simple Regression

    36/43

    X

    Y

    Another guess at the location of the regression line

    (different intercept and slope, same center)

    We will end up being reasonably confident

  • 8/4/2019 Correlations, Simple Regression

    37/43

    X

    Y

    We will end up being reasonably confident

    that the true regression line is somewhere

    in the indicated region.

  • 8/4/2019 Correlations, Simple Regression

    38/43

    X

    Y

    Estimated Regression Line

    errors/residuals

  • 8/4/2019 Correlations, Simple Regression

    39/43

    X

    Y

    Estimated Regression Line

  • 8/4/2019 Correlations, Simple Regression

    40/43

    X

    Y

    Estimated Regression Line

    Error Terms have to be drawn vertically

  • 8/4/2019 Correlations, Simple Regression

    41/43

    X

    Y

    Estimated Regression Line

    iii yye

    i

    y

    iy

    ix

    bXaY

    :LineRegressiontheofEquation

    iy =y hat: predictedvalue of Y for Xi

  • 8/4/2019 Correlations, Simple Regression

    42/43

    Estimating the Regression Line

    Idea: find the formula for the line that minimizesthe squared errors error: distance between actual data point and

    predicted value Y=a+bX

    Y=b0+b1x

    b1=slope of regression line

    b0=Y intercept of regression line

  • 8/4/2019 Correlations, Simple Regression

    43/43

    b1 X

    iX

    Y

    iY

    i1

    N

    Xi X

    2

    i1

    N

    ALEKS:

    b1 rsy

    sx

    b0 Y b1X

    Y=b0+b1X

    b1 xiyi nx y

    i1

    n

    (n 1)sx

    2

    b1 (slope)

    b0 (Y intercept)

    using

    correlation

    coefficient