population and sample mean

60
POPULATION AND SAMPLE MEAN Avjinder Singh Kaler and Kristi Mai

Upload: avjinder-avi-kaler

Post on 26-Jan-2017

427 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Population and sample mean

POPULATION AND SAMPLE MEAN Avjinder Singh Kaler and Kristi Mai

Page 2: Population and sample mean

β€’ Estimating a Population Mean

β€’ 𝜎 unknown

β€’ 𝜎 known

β€’ Estimating the difference between two population means

β€’ Independent samples

β€’ Dependent samples

Page 3: Population and sample mean

Main Ideas: β€’ The sample mean is the best point estimate of the population mean

β€’ We can use a sample mean to construct a C.I. to estimate the true value of a population mean

β€’ We must learn how to find the sample size necessary to estimate a population mean

Recall:

β€’ π‘₯ = π‘₯

𝑛 : sample mean

β€’ π‘₯ targets πœ‡ and is an individual value that is used as an estimate (i.e. it is a point estimate for πœ‡)

Notice: There are two situations when estimating a population mean 1. 𝜎, the population standard deviation, is known

2. 𝜎, the population standard deviation, is NOT known

Page 4: Population and sample mean

β€’ Margin of Error (estimating the population mean when 𝜎 is known)

β€’ 𝐸 = 𝑍𝛼/2 βˆ—πœŽ

𝑛

β€’ Notice: The margin of error changes when what we are estimating changes!!

β€’ Constructing a C.I.

β€’ Requirements:

The sample must be a SRS

The value of 𝜎 is known

The population is normal OR 𝑛 > 30

β€’ C.I.:

π‘₯ βˆ’ 𝐸 < πœ‡ < π‘₯ + 𝐸

Same as: π‘₯ Β± 𝐸 and (π‘₯ βˆ’ 𝐸, π‘₯ + 𝐸)

Page 5: Population and sample mean

β€’ Minimum required sample size

β€’ Sample size needed: β€’ If 𝜎 is known:

𝑛 =(𝑍𝛼/2) βˆ— 𝜎

𝐸

2

β€’ If not a whole number, ALWAYS round up to the nearest whole number for minimum required sample sizes

Page 6: Population and sample mean

Some Key Points: β€’ The sample mean is still the best point estimate of the population

mean

β€’ We can use a sample mean to construct a C.I. to estimate the true value of a population mean even when we do not know the population standard deviation

β€’ We see that if requirements are generally met but 𝜎 is unknown, we must use a t-distribution

Page 7: Population and sample mean

The Student t Distribution:

β€’ If a population has a normal distribution, then the following formula describes the t-distribution:

𝑑 =π‘₯ βˆ’πœ‡

𝑠

𝑛

β€’ The above formula is a t-score; a measure of relative standing

β€’ We are estimating the unknown population standard deviation with the sample standard deviation

β€’ This estimation would typically lead to unreliability and so we compensate for this inherent unreliability with wider intervals and β€œfatter tails” displayed in the density curve

β€’ We must utilize a t-table or t-calculator when using the t-distribution

β€’ We NEED degrees of freedom

Degrees of Freedom (𝑑𝑓) for a collection of sample data is the number of sample values that can vary after certain restrictions have been imposed upon all the data values

Page 8: Population and sample mean
Page 9: Population and sample mean

β€’ Recall:

β€’ 𝑠 = π‘₯βˆ’π‘₯ 2

π‘›βˆ’1 : sample standard deviation

β€’ Margin of Error

𝐸 = 𝑑𝛼/2 βˆ—π‘ 

𝑛 with 𝑑𝑓 = 𝑛 βˆ’ 1

Notice: The margin of error also changes when the information we have changes

Page 10: Population and sample mean

Constructing a C.I. β€’ Requirements:

β€’ The sample must be a SRS

β€’ The value of 𝜎 is NOT known

β€’ The population is normal OR 𝑛 > 30

β€’ C.I.: β€’ π‘₯ βˆ’ 𝐸 < πœ‡ < π‘₯ + 𝐸

β€’ Same As: π‘₯ Β± 𝐸 and π‘₯ βˆ’ 𝐸, π‘₯ + 𝐸

β€’ Notice that the C.I. appears to be the same – however, it will NOT be the same as the previous CI for πœ‡ because (with our uncertainty about 𝜎) the margin of error changed

Page 11: Population and sample mean

β€’ The student t distribution is different for different sample sizes

β€’ The t distribution has the same general symmetric bell shape as the Normal distribution, but reflects the greater variability that is expected when samples are smaller

β€’ The t distribution has a mean of 𝑑 = 0 just as the standard normal distribution has a mean of 𝑧 = 0

Page 12: Population and sample mean

Is the population normal OR is n>30

Is 𝜎 known or unknown?

Use normal distribution

(Normal -Calculator)

Use t distribution

(t-calculator) Use nonparametric

method or bootstrapping

technique

Yes

No

Known

Unknown

Page 13: Population and sample mean

Requirements: The sample must be a SRS The value of 𝜎 is known The population is normal OR 𝑛 > 30

Test Statistic: z =π‘₯ βˆ’πœ‡

𝜎

𝑛

πœ‡: population mean (assumed true under 𝐻0)

Note: p-values and critical values are from Z-table

Requirements:

The sample must be a SRS The value of 𝜎 is NOT known

The population is normal OR 𝑛 > 30

Test Statistic: t =π‘₯ βˆ’πœ‡

𝑠

𝑛

; 𝑑𝑓 = 𝑛 βˆ’ 1

πœ‡: population mean (assumed true

under 𝐻0)

Note: p-values and critical values are from t-table

𝜎 known 𝜎 NOT known

Page 14: Population and sample mean

Listed below are the measured radiation emissions (in W/kg) corresponding to a sample of cell phones. Use a 0.05 level of significance to test the claim that cell phones have a mean radiation level that is less than 1.00 W/kg. The summary statistics are: .

0.38 0.55 1.54 1.55 0.50 0.60 0.92 0.96 1.00 0.86 1.46

0.938 and 0.423x s

Page 15: Population and sample mean

Requirement Check: 1. We assume the sample is a simple random sample. 2. The sample size is n = 11, which is not greater than 30, so we must check

a normal quantile plot for normality.

Note: (See plot on the right) The points are reasonably close to a straight line and there is no other patter, so we conclude that The data appear to be from a normally distributed Population.

Page 16: Population and sample mean

Step 1: The claim that cell phones have a mean radiation level less than 1.00 W/kg is expressed as ΞΌ < 1.00 W/kg.

Step 2: The alternative to the original claim is ΞΌ β‰₯ 1.00 W/kg.

Step 3: The hypotheses are written as:

Step 4: The stated level of significance is 𝛼 = 0.05.

Step 5: Because the claim is about a population mean ΞΌ, the statistic most relevant to this test is the sample mean:

0

1

: 1.00 W/kg

: 1.00 W/kg

H

H

x

Page 17: Population and sample mean

Step 6: Calculate the test statistic and then find the P-value or the critical value using StatCrunch.

0.938 1.000.486

0.423

11

xxt

s

n

Page 18: Population and sample mean

Step 7: Critical Value Method: Because the test statistic of t = –0.486 does not fall in the critical region bounded by the critical value of t = –1.812, fail to reject the null hypothesis.

Page 19: Population and sample mean

Step 7: P-value method:

Using StatCrunch, the P-value computed is 0.3187. Since the P-value is greater than Ξ± = 0.05, we fail to reject the null hypothesis.

Step 8:

Because we fail to reject the null hypothesis, we conclude that there is not sufficient evidence to support the claim that cell phones have a mean radiation level that is less than 1.00 W/kg.

Page 20: Population and sample mean

We can use a confidence interval for testing a claim about ΞΌ.

For a two-tailed test with a 0.05 significance level, we construct a 95% confidence interval.

For a one-tailed test with a 0.05 significance level, we construct a 90% confidence interval.

Page 21: Population and sample mean

Using the cell phone example, construct a confidence interval that can be used to test the claim that ΞΌ < 1.00 W/kg, assuming a 0.05 significance level.

Note that a left-tailed hypothesis test with Ξ± = 0.05 corresponds to a 90% confidence interval.

Using StatCrunch, the confidence interval is:

0.707 W/kg < ΞΌ < 1.169 W/kg

Because the value of ΞΌ = 1.00 W/kg is contained in the interval, we fail to reject the null hypothesis that ΞΌ = 1.00 W/kg .

Based on the sample of 11 values, we do not have sufficient evidence to support the claim that the mean radiation level is less than 1.00 W/kg.

Page 22: Population and sample mean

When Οƒ is known, we use test that involves the standard normal distribution.

In reality, it is very rare to test a claim about an unknown population mean

while the population standard deviation is somehow known.

The procedure is essentially the same as a t test, with the following

exception: The test statistic is

The P-value and critical values can be computed using StatCrunch.

xxz

n

Page 23: Population and sample mean

If we repeat the cell phone radiation example, with the assumption that Οƒ = 0.480 W/kg, the test statistic is:

The example refers to a left-tailed test, so the P-value is the area to the left of z = –0.43, which is 0.3342.

Since the P-value is greater than 𝛼 = 0.05, we fail to reject the null and reach the same conclusion as before.

0.938 1.000.43

0.480

11

xxz

n

Page 24: Population and sample mean

Main Ideas: β€’ The sample mean is the best point estimate of the population mean

β€’ We can use two independent sample means to construct a

confidence interval that can be used to estimate the true value of the

underlying difference in the corresponding population means

β€’ We can also test claims about the difference between two population

means

Page 25: Population and sample mean

Notation:

Page 26: Population and sample mean

Dependent samples two samples are dependent if the

sample values are paired

Independent samples two samples are independent if

the sample values from one are

not related to or somehow

naturally paired/matched with the

sample values from the other

Page 27: Population and sample mean

Requirements: β€’ Population standard deviations (𝜎1 and 𝜎2) are NOT known and

NOT assumed equal

β€’ The two samples are independent

β€’ Both samples are SRS

β€’ Both 𝑛1 > 30 and 𝑛2 > 30 OR both samples come from populations that are normal

Page 28: Population and sample mean

β€’ Margin of Error

𝐸 = 𝑑𝛼/2 βˆ—π‘ 12

𝑛1+

𝑠22

𝑛2 and 𝑑𝑓 = min 𝑛1 βˆ’ 1, 𝑛2 βˆ’ 1

β€’ C.I.: π‘₯ 1 βˆ’ π‘₯ 2 βˆ’ 𝐸 < πœ‡1 βˆ’ πœ‡2 < π‘₯ 1 βˆ’ π‘₯ 2 + 𝐸

β€’ Notice that we are often interested in whether or not 0 is included within the limits of the confidence interval constructed, i.e., whether or not πœ‡1 βˆ’ πœ‡2 = 0 is reasonable

Page 29: Population and sample mean

β€’ Requirements:

β€’ Requirements and degrees of freedom (df) are the same as in the C.I. before

β€’ Test Statistic: 𝑑 =π‘₯ 1βˆ’π‘₯ 2 βˆ’ πœ‡1βˆ’πœ‡2

𝑠12

𝑛1+

𝑠22

𝑛2

Page 30: Population and sample mean

Researchers conducted trials to investigate the effects of color on creativity. Subjects with a red background were asked to think of creative uses for a brick; other subjects with a blue background were given the same task. Responses were given by a panel of judges. Researchers make the claim that β€œblue enhances performance on a creative task”. Test the claim using a 0.01 significance level.

Page 31: Population and sample mean

Requirement check:

1. The values of the two population standard deviations are unknown and assumed not equal.

2. The subject groups are independent.

3. The samples are simple random samples.

4. Both sample sizes exceed 30.

The requirements are all satisfied.

Page 32: Population and sample mean

The data:

Background color Sample size Sample mean Sample standard deviation

Red Background n = 35 s = 0.97

Blue Background n = 36 s = 0.63

3.39x

3.97x

Page 33: Population and sample mean

Step 1: The claim that β€œblue enhances performance on a creative task”

can be restated as β€œpeople with a blue background (group 2) have a

higher mean creativity score than those in the group with a red background

(group 1)”. This can be expressed as ΞΌ1 < ΞΌ2.

Step 2: If the original claim is false, then ΞΌ1 β‰₯ ΞΌ2.

Step 3: The hypotheses can be written as:

OR 𝐻0: πœ‡1βˆ’πœ‡2=0𝐻1: πœ‡1βˆ’πœ‡2<0

0 1 2

1 1 2

:

:

H

H

Page 34: Population and sample mean

Step 4: The significance level is Ξ± = 0.05.

Step 5: Because we have two independent samples and we are testing a claim

about two population means, we use a t-distribution.

Step 6: Calculate the test statistic. 1 2 1 2

2 2

1 2

1 2

2 2

( ) ( )

(3.39 3.97) 02.979

0.97 0.63

35 36

x xt

s s

n n

Page 35: Population and sample mean

Step 6: Because we are using a t-distribution, the critical value of t = –2.441 is found using StatCrunch. We use 34 degrees of freedom.

Page 36: Population and sample mean

Step 7: Because the test statistic does fall in the critical region, we reject the

null hypothesis ΞΌ1 – ΞΌ2.

P-Value Method: StatCrunch provides a P-value, and the area to the left of

the test statistic of t = –2.979 is 0.0021. Since this is less than the significance

level of 0.01, we reject the null hypothesis.

Conclusion: There is sufficient evidence to support the claim that the red

background group has a lower mean creativity score than the blue

background group.

Page 37: Population and sample mean

Using the data from this color creativity example, construct a 98% confidence interval estimate for the difference between the mean creativity score for those with a red background and the mean creativity score for those with a blue background.

Page 38: Population and sample mean

Using StatCrunch, the 98% confidence interval obtained is:

βˆ’1.05 < πœ‡1 βˆ’ πœ‡2 < βˆ’0.11

2 2 2 2

1 2/2

1 2

0.97 0.632.441 0.475261

35 36

s sE t

n n

1 23.39 and 3.97x x

1 2 1 2 1 2

1 2

( ) ( ) ( )

1.06 ( ) 0.10

x x E x x E

Page 39: Population and sample mean

We are 98% confident that the limits –1.05 and –0.11 actually do contain the difference between the two population means.

Because those limits do not include 0, our interval suggests that there is a significant difference between the two means.

Page 40: Population and sample mean

These methods are rarely used in practice because the underlying assumptions are usually not met.

1. The two population standard deviations are both known

β€’ the test statistic will be a z instead of a t and use the standard normal model.

2. The two population standard deviations are unknown but assumed to be equal

β€’ pool the sample variances

Page 41: Population and sample mean

1 2 1 2

2 2

1 2

1 2

( ) ( )x xz

n n

The test statistic will be:

P-values and critical values are found using StatCrunch.

Page 42: Population and sample mean

1 2 1 2 1 2( ) ( ) ( )x x E x x E

2 2

1 2/ 2

1 2

E zn n

Page 43: Population and sample mean

The test statistic will be

Where the pooled sample variance is

with

1 2 1 2

2 2

1 2

( ) ( )

p p

x xt

s s

n n

2 22 1 1 2 2

1 2

( 1) ( 1)

( 1) ( 1)p

n s n ss

n n

1 2df 2n n

Page 44: Population and sample mean

1 2 1 2 1 2( ) ( ) ( )x x E x x E

2 2

/2

1 2

p ps sE t

n n

1 2df 2n n

Page 45: Population and sample mean
Page 46: Population and sample mean

Independent Samples (Two Additional Methods)

β€’ 𝜎1 π‘Žπ‘›π‘‘ 𝜎2 known – Z Test / Z Interval

β€’ 𝜎1 = 𝜎2 -- Pooled Sample Variance

Dependent Samples

β€’ When samples are paired, we use a different methodology

Page 47: Population and sample mean

Main Ideas: β€’ The sample mean is still the best point estimate of the population mean

β€’ We can use two dependent sample means to construct a confidence interval

that can be used to estimate the true value of the underlying difference in the

corresponding population means

β€’ We can also test claims about the difference between two population means

β€’ In experimental design, using dependent samples is generally better and more

practical than assuming two independent samples

Page 48: Population and sample mean

Notation:

β€’ 𝑑: π‘‘β„Žπ‘’ π‘–π‘›π‘‘π‘–π‘£π‘–π‘‘π‘’π‘Žπ‘™ π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’ 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 π‘‘π‘€π‘œ π‘£π‘Žπ‘™π‘’π‘’π‘  𝑖𝑛 π‘Ž 𝑠𝑖𝑛𝑔𝑙𝑒 π‘šπ‘Žπ‘‘π‘β„Žπ‘’π‘‘ π‘π‘Žπ‘–π‘Ÿ

β€’ 𝑛: π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘π‘Žπ‘–π‘Ÿπ‘  π‘œπ‘“ π‘‘π‘Žπ‘‘π‘Ž

β€’ πœ‡π‘‘: π‘‘β„Žπ‘’ π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› π‘šπ‘’π‘Žπ‘› π‘œπ‘“ π‘‘β„Žπ‘’ π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’π‘  π‘“π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘‘β„Žπ‘’ π‘π‘Žπ‘–π‘Ÿπ‘  π‘œπ‘“ π‘‘π‘Žπ‘‘π‘Ž

β€’ 𝑑 : π‘‘β„Žπ‘’ π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘šπ‘’π‘Žπ‘› π‘œπ‘“ π‘‘β„Žπ‘’ π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’π‘  π‘“π‘œπ‘Ÿ π‘‘β„Žπ‘’ π‘π‘Žπ‘–π‘Ÿπ‘’π‘‘ π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘‘π‘Žπ‘‘π‘Ž

β€’ 𝑠𝑑: π‘‘β„Žπ‘’ π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘› π‘œπ‘“ π‘‘β„Žπ‘’ π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’π‘  π‘“π‘œπ‘Ÿ π‘‘β„Žπ‘’ π‘π‘Žπ‘–π‘Ÿπ‘’π‘‘ π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘‘π‘Žπ‘‘π‘Ž

Page 49: Population and sample mean

Requirements

β€’ The sample data are dependent

β€’ Both samples are SRS

β€’ Either 𝑛 > 30 OR the paired differences come from a population that is normal

Margin of Error

β€’ 𝐸 = 𝑑𝛼/2 βˆ—π‘ π‘‘

𝑛 with 𝑑𝑓 = 𝑛 βˆ’ 1

C.I.

β€’ 𝑑 βˆ’ 𝐸 < πœ‡π‘‘ < 𝑑 + 𝐸

Notice that we are often interested in whether or not 0 is included within the limits of the confidence interval constructed, i.e., whether or not πœ‡π‘‘ = 0 is reasonable

Page 50: Population and sample mean

Requirements:

β€’ Requirements and Degrees of freedom (𝑑𝑓) are the same as in the C.I. above

Test Statistic:

𝑑 =𝑑 βˆ’πœ‡π‘‘

𝑠𝑑𝑛

Page 51: Population and sample mean

Use the sample data below with a significance level of 0.05 to test the

claim that for the population of heights of presidents and their main

opponents, the differences have a mean greater than 0 cm (so presidents

tend to be taller than their opponents).

Height (cm) of President 189 173 183 180 179

Height (cm) of Main Opponent 170 185 175 180 178

Difference d 19 -12 8 0 1

Page 52: Population and sample mean

Requirement Check:

1. The samples are dependent because the values are paired.

2. The pairs of data are randomly selected.

3. The number of data points is 5, so normality should be checked (and it is

assumed the condition is met).

Page 53: Population and sample mean

Step 1: The claim is that Β΅d > 0 cm.

Step 2: If the original claim is not true, we have Β΅d ≀ 0 cm.

Step 3: The hypotheses can be written as:

0

0

: 0 cm

: 0 cm

d

d

H

H

Page 54: Population and sample mean

Step 4: The significance level is Ξ± = 0.05.

Step 5: We use the Student t-distribution.

The summary statistics are: 3.2

11.4

d

s

Page 55: Population and sample mean

Step 6: Determine the value of the test statistic:

with df = 5 – 1 = 4

3.2 00.628

11.4

5

d

d

dt

s

n

Page 56: Population and sample mean

Step 6: Using StatCrunch, the P-value is 0.282.

Using the critical value method:

Page 57: Population and sample mean

Step 7: Because the P-value exceeds 0.05, or because the test statistic does not fall in the critical region, we fail to reject the null hypothesis.

Conclusion: There is not sufficient evidence to support the claim that for the population of heights of presidents and their main opponent, the differences have a mean greater than 0 cm.

In other words, presidents do not appear to be taller than their opponents.

Page 58: Population and sample mean

Confidence Interval: Support the conclusions with a 90% confidence

interval estimate for Β΅d.

/2

11.42.132 10.8694

5

dsE t

n

3.2 10.8694 3.2 10.8694

7.7 14.1

d

d

d

d E d E

Page 59: Population and sample mean

We have 90% confidence that the limits of –7.7 cm and 14.1 cm contain the true value of the difference in height (president’s height – opponent’s height).

See that the interval does contain the value of 0 cm, so it is very possible that the mean of the differences is equal to 0 cm, indicating that there is no significant difference between the heights.

Page 60: Population and sample mean

Complete the following:

β€’ Practice Problems 5

β€’ Practice Problems 6