describing relationships: scatter plots and correlation ● the world is an indivisible whole...

17
Describing Relationships: Scatter Plots and Correlation The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.) Things (and people) exist in relationships Here we study relationships between two quantitative variables (e.g., IQ test score and school GPA.) Graphical description: scatter plots Look for direction, form, strength, and outliers Numerical measure: correlation coefficient Definition Direction and strength of linear relationships

Upload: daniela-reynolds

Post on 24-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Describing Relationships: Scatter Plots and Correlation

● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.) Things (and people) exist in relationships

● Here we study relationships between two quantitative variables (e.g., IQ test score and school GPA.)

● Graphical description: scatter plots – Look for direction, form, strength, and outliers

● Numerical measure: correlation coefficient– Definition– Direction and strength of linear relationships

Page 2: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Scatter Plots

Convention: Dependent variable on the vertical axis, independent (explanatory) variable on the horizontal axis.

Page 3: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Scatter Plot Example: IQ score and GPA

Page 4: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Scatter Plot Example: Wealth and Health

Page 5: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

What to Look For in a Scatter Plot

Functional form: Nonlinear? Linear? Direction: Positive? Negative? Strength: How clear is the pattern?

Page 6: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

What to Look For in a Scatter Plot: Direction

Page 7: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Outliers

Page 8: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

How Strong Is the Relationship?

Two scatter plots of the same data, using different scales. Visual impressions are not very reliable!

Page 9: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

The Correlation Coefficient ● Numerical measure of direction and strength of a

linear relationship● Linear relationships are particularly important

– Simplest, easiest to understand

– Some nonlinear relationships can be transformed into linear by transforming variables (e.g. Using square terms for curvilinear patterns: Age and voting)

– A “first order approximation” of arbitrary relationships.

Page 10: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Measuring Linear Correlation with r

● Direction: Does the scatter plot slope upward or downward? – Positive r indicates a positive relationship, negative r

indicates a negative relationship.● Strength: How strong is the association? How closely

does a non-horizontal straight line fit the points of a scatter plot?– the stronger the relationship, the larger the

magnitude of r.● Formula:

r = 1n−1 ∑i=1

n

x i−x

s xy i− y

s y

Page 11: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Strength of linear correlation

Page 12: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Strength and Statistical Significance

● A strong relationship seen in the sample may indicate a strong relationship in the population, or the sample results may be due to chance and the relationship in the population is not strong or is zero. (We'll test whether the relationship is “significant” in the context of linear regression)

● “Statistical significance” does not imply the relationship is strong enough to be considered “practically important”. (“Non-zero” is not necessarily “big” in size.)

– Even weak relationships may be labeled statistically significant if the sample size is very large.

– Even very strong relationships may not be labeled statistically significant if the sample size is very small.

Page 13: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Properties of the Correlation Coefficient● r is always between -1 and 1 ● r > 0: as one variable changes, the other variable

tends to change in the same direction● r < 0: as one variable changes, the other variable

tends to change in the opposite direction● r=+1: A perfect positive linear relationship:

y=a+bx, b>0● r=-1: A perfect negative linear relationship

y=a+bx, b<0● r=0: No linear relationship (the scatter plot points are

best fit by a horizontal line)● Limitations:

– Outliers can inflate or deflate correlations– Correlation can be spurious due to confounding

Page 14: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

The Effect of Outliers on the Correlation

In this figure graphing the relationship between the length of leg bone and an upper arm bone in 6 fossil specimen of an extinct beast, moving one point in the figure changes r from .994 to .64!

Page 15: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

The Effect of Outliers on the Correlation

Page 16: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Using Software

● Stata:

– For scatter plot: Graphics-->Two way graph– For r: “correlate”, “pwcorr” (pair-wise)– Example: “sysuse lifeexp”; relationship

between safewater access and life expectancy, country level data.

notes;

twoway (scatter lexp safewater);

correlate lexp safewater; pwcorr; pwcorr,sig ● “Correlation and Regression Demo” applet at the

book's website (explore the effect of outliers, for example.)

Page 17: Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Statistical versus Deterministic Relationships● y=a+bx is a deterministic relationship: knowing the

value of x meaning knowing the value of y (assume we know the intercept and the slope). The correlation coefficient is 1 (or -1) in this case.

– e.g. Distance traveled = speed x time. Fixing speed, there's a deterministic relationship between distance and time.

● In social science data deterministic relationships are rare. e.g., time studying and exam grade.

● y=a+bx + describes such imperfect linear relationships between two variables: the value of y is not completely determined by x (and the parameters a and b), but is also affected by something else,

● We discuss the fitting of straight lines to imperfect scatter plot data next time.