seminar 15 | tuesday, october 18, 2007 | aliaksei smalianchuk

27
Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Upload: alberta-bridges

Post on 14-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Page 2: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Means and Variances

What happens to means and variances when data is manipulated?

Let’s check by manipulating data from the survey.

Page 3: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Data

Height in inches (HT) Shoe size (Shoe) Age (Age) Additional Columns:

Height with a 1 inch heel (HeightPlus1)Height in centimeters (2.5TimesHeight)Sum of height and shoe size

(HeightPlusShoe)Sum of height and age (HeightPlusAge)

Page 4: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Statistics

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Page 5: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 1

The mean of heel heights is one inch larger than then mean of heights

Page 6: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Why?

If every element is modified by a constant number the mean follows the same pattern.

Page 7: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 2

The standard deviation of heel heights equals the standard deviation of heights

Page 8: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Why?

Standard deviation is relative to the mean, and the shape of the distribution didn’t change

Page 9: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 3

The standard deviation of heights is 2.5 times the standard deviation of heights in centimeters

Page 10: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Why?

By multiplying all data values by a constant value we are increasing the spread of the histogram by the same value, therefore modifyingthe properties that depend on the spread (like standard deviation.)

Page 11: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 4

Mean of HeightPlusShoe = Mean of Height + Mean of Shoe

Page 12: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 5

Mean of HeightPlusAge = Mean of Height + Mean of Age

Page 13: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Why?

Since

Page 14: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variances

Variance = σ2

Variances apply to a probability distribution

Variance is a way to capture the degree of spread of a distribution

Page 15: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variances

Variable Variance

HT 15.50784

Shoe 3.796263

Age 8.479744

HeightPlusShoe 32.41025

HeightPlusAge 24.13757

Page 16: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Dependence

Are shoe sizes and heights dependent? Are age and height dependent? Let’s check using scatter plots

Page 17: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Height vs. Shoe Size

Page 18: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Height vs. Age

Page 19: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Back to variances

Variance of HeightPlusShoe is much greater than Var(Height) + Var(Shoe)

Variance of HeightPlusAge is very close to Var(Height) + Var(Age)

Variable VarianceHT 15.50784Shoe 3.796263Age 8.479744

HeightPlusShoe 32.41025HeightPlusAge 24.13757

Page 20: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Why?

Can you see a difference in relationships (Height vs. Shoe Size) and (Height vs. Age?)

Page 21: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Dependence

Adding two dependent data distributions produces extremes (adding small values with corresponding small values and adding large values to correspondent large values)

This makes the variance much larger.

Page 22: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Dependence

In case of independent sets, values do not necessarily correspond by relative value (large values can be added to small values)

This does not alter the spread of the distribution much

Page 23: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Variance of sample mean Mean = (X1 + X2 + … + Xn)/n

Variance [(X1 + X2+ … +Xn)/n] = (Variance[X1] + Variance[X2]+ … + Variance[Xn])/n

Page 24: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Dependence?

Would this work for dependent values of X1, X2 … Xn ?

Would the variance produced by this formula be larger or smaller than actual?

Sampling without replacementWould the variance formula hold true?Why?

Page 25: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Dependence

Adding variances of dependent values will produce a smaller result than expected because adding dependent data sets will produce extremes, altering the spread

Sampling without replacement on smaller populations (n < 10) will produce dependence

Page 26: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

The End

Page 27: Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Extra Credit (Dr. Pfenning) Use Minitab Calculator to create column

“Birthyear” Plot Earned vs. Birthyear, note relationship Create column “EarnedPlusBirthyear” Find sds of Earned, Birthyear,

EarnedPlusBirthyear, square to variances Compare variances Explain results