chapter 4 more on two-variable data

16
Chapter 4 More on Two- Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be born” Loren Eiseley

Upload: ross

Post on 19-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Chapter 4 More on Two-Variable Data. “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be born” Loren Eiseley. 4.1 Some models for scatterplots with non-linear data (pp. 176-197). Exponential growth Growth or decay function Form: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4 More on Two-Variable Data

Chapter 4More on Two-Variable Data

“Each of us is a statistical impossibility around which hover a million other lives that were never destined to be born”

Loren Eiseley

Page 2: Chapter 4 More on Two-Variable Data

4.1Some models for scatterplots with non-linear data (pp. 176-197)

Exponential growth Growth or decay function Form:

Power function Form:

xy ab

by ax

Page 3: Chapter 4 More on Two-Variable Data

Logarithms

Rules for logarithms

if and only if

0 0 1

log

, ,

yb x y b x

x b b

log log log

log log log

log logp

AB A B

AA B

B

A p A

Page 4: Chapter 4 More on Two-Variable Data

In other words… The log of a product is the sum of the logs.

The log of a quotient is the difference of the logs.

The log of a power is the power times the log.

Page 5: Chapter 4 More on Two-Variable Data

4.2Interpreting Correlation and Regression (pp. 206-214)

Overview: Correlation and regression need to be interpreted with

CAUTION. Two variables may be strongly associated, but this DOES NOT MEAN that one causes the other.

High Correlation does not imply causation! We need to consider lurking variables and common

response.

Page 6: Chapter 4 More on Two-Variable Data

Extrapolation The use of a regression line or curve to

make a prediction outside of the domain of the values of your explanatory variable x that you used to obtain your line or curve.

These predictions cannot be trusted.

Page 7: Chapter 4 More on Two-Variable Data

Lurking Variable A variable that affects the relationship of the

variables in the study. NOT INCLUDED among the variables studied. Example: strong positive association might exist

between shirt size and intelligence for teenage boys. A lurking variable is AGE. Shirt size and intelligence among teenage boys

generally increases with age.

Page 8: Chapter 4 More on Two-Variable Data

If there is a strong association between two variables x and y, any one of the following statements could be true: x causes y:

Association DOES NOT imply causation, but causation could exist.

Both x and y are responding to changes in some unobserved variable or variables. This is called common response.

The effect of x on y is hopelessly mixed up with the effects of other variables on y. This is called confounding.

Always a potential problem in observational studies. Can be somewhat controlled in experiments with a control group and a

treatment group.

Page 9: Chapter 4 More on Two-Variable Data

4.3Relations in Categorical Data

(pp. 215-226)

Overview: We can see relations between two or more

categorical variables by setting up tables. So far, we have studied relationships with a

quantitative response variable.

Page 10: Chapter 4 More on Two-Variable Data

Notation Prob(X) is the probability that X is true.

Prob(X/Y) is the probability that X is true, given that Y is true

Page 11: Chapter 4 More on Two-Variable Data

Two-way Table Describes the relationship between two

categorical variables: Row variable Column variable

Row totals and column totals give MARGINAL DISTRIBUTIONS of the two variables separately. DO NOT give any information about the

relationships between the variables.

Can be used in the calculation of probabilities.

Page 12: Chapter 4 More on Two-Variable Data

Example: 200 employees of a company are classified according to the Table below, where A, B, and C are mutually exclusive.

Have A Have B Have C Totals

Female 20 40 60 120

Male 30 10 40 80

Totals 50 50 100 200

Page 13: Chapter 4 More on Two-Variable Data

Example: (con’t) What is the probability that a randomly chosen

person is female? Prob(F) = 120/200 = 60%

What is the probability that a randomly chosen person has property A? Prob(A) = 50/200 = 25%

If a randomly chosen person is female, what is the probability that she has property B? Prob(B/F) = 40/50 = 80%

Note: equals Prob(B and F)/Prob(B)

Page 14: Chapter 4 More on Two-Variable Data

Example: (con’t) If a randomly chosen person has property

C, what is the probability that the individual is male? Prob(M/C) = 40/100 = 40%

Note: equals Prob(C and M)/Prob(M)

If a randomly chosen person has B or C, what is the probability that the person is male? Prob(M/B or C) = 50/150 = 33.3%

Page 15: Chapter 4 More on Two-Variable Data

Simpson’s Paradox The reversal of the direction of a

comparison or an association when data from several groups are combined to form a single group.

Lurking variables are categorical. An extreme form of the fact that observed

associations can be misleading when there are lurking variables.

Page 16: Chapter 4 More on Two-Variable Data

Example of Simpson’s Paradox First Half of BB Season

Hits Times Bat

at bat avg.Caldwell 60 200 .300

Wilson 29 100 .290

Second Half of BB Season

Hits Times Bat

at bat avg.

50 200 .250

1 5 .200

Batting avgs. For entire season: Caldwell: 110/400 = .275

Wilson: 30/105 = .286

Calwell had a better avg. than Wilson in each half; however, Caldwell ends up with a LOWER OVERALL avg. than Wilson.