Download - summary
![Page 1: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/1.jpg)
SUMMARY
![Page 2: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/2.jpg)
• Z-distribution• Central limit theorem
![Page 3: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/3.jpg)
Sweet demonstration of the sampling distribution of the mean
![Page 4: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/4.jpg)
Sweet data
𝑛=20
![Page 5: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/5.jpg)
R-code – sampling distribution exactdata.set <- c(6,4,5,3,10,3,5,3,6,5,4,8,7,2,8,5,8,5,4,0)
mean(data.set)
sd(data.set)*sqrt(19/20) #standard deviation
(sd(data.set)*sqrt(19/20))/sqrt(20) sample_size<-5
samps <- combn(data.set, sample_size)
xbars <- colMeans(samps)
barplot(table(xbars))
![Page 6: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/6.jpg)
Sampling distribution – exact
𝜇𝑥=𝑀=??
𝑀=𝜇=5.05
![Page 7: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/7.jpg)
R-code (sampling distribution simulated)
data.set <- c(6,4,5,3,10,3,5,3,6,5,4,8,7,2,8,5,8,5,4,0)
sample_size<-3
number_of_samples<-20
samples <- replicate(number_of_samples,sample(data.set, sample_size, replace=T)); out<-colMeans(samples); mean(out); sd(out)
barplot(table(out))
![Page 8: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/8.jpg)
Sampling distribution – simulated
![Page 9: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/9.jpg)
Sampling distribution – simulated
![Page 10: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/10.jpg)
ESTIMATION
![Page 11: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/11.jpg)
Statistical inference
If we can’t conduct a census, we collect data from the sample of a population.
Goal: make conclusions about that population
![Page 12: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/12.jpg)
Demonstration problem• You sample 36 apples from your farm’s harvest of over
200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).
• What is the probability that the mean weight of all 200 000 apples is within 100 and 124 grams?
![Page 13: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/13.jpg)
What is the question?• We would like to know the probability that the population
mean is within 12 of the sample mean.
• But this is the same thing as
• But this is the same thing as
• So, if I am able to say how many standard deviations away from I am, I can use the Z-table to figure out the probability.
![Page 14: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/14.jpg)
Slight complication• There is one caveat, can you see it?• We don’t know a standard deviation of a sampling
distribution (standard error). We only know it equals to , but is uknown.
• What we’re going to do is to estimate . Best thing we can use is a sample standard deviation , that equals to 40.
• . This is our best estimate of a standard error.• Now you finish the example. What is the probability that
population mean lies within 12 of the sample if the SE equals to 6.67?• 92.82%
![Page 15: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/15.jpg)
This is neat!• You sample 36 apples from your farm’s harvest of over
200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). What is the probability that the population mean weight of all 200 000 apples is within 100 and 124 grams?
• We started with very little information (we know just the sample statistics), but we can infere that
with the probability of 92.82% a population mean lies within 12 of our sample mean!
![Page 16: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/16.jpg)
Point vs. interval estimate• You sample 36 apples from your farm’s harvest of over
200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).
• Goal: estimate a population mean1. A population mean is estimated as a sample mean. i.e.
we say a population mean equals to 112 g. This is called a point estimate (bodový odhad).
2. However, we can do better. We can estimate, that our true population mean will lie with the 95% confidence within an interval of (interval estimate).
𝑥±1.96× 𝑠√𝑛
![Page 17: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/17.jpg)
Confidence interval• This type of result is called a confidence interval
(interval spolehlivosti, konfidenční interval).
• The number of stadandard errors you want to add/subtract depends on the confidence level (e.g. 95%) (hladina spolehlivosti).
𝑥±𝑍× 𝑠√𝑛
margin of errormožná odchylka
critical valuekritická hodnota
![Page 18: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/18.jpg)
Confidence level• The desired level of confidence is set by the researcher
(not determined by data).• If you want to be 95% confident with your results, you add/subtract
1.96 standard errors (empirical rule says about 2 standard errors).• 95% interval spolehlivosti
Confidence level Z-value80 1.2890 1.6495 1.9698 2.3399 2.58
![Page 19: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/19.jpg)
80% 90%
95% 99%
1.28
1.96
1.64
2.58
![Page 20: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/20.jpg)
Small sample size confidence intervals
• 7 patient’s blood pressure have been measured after having been given a new drug for 3 months. They had blood pressure increases of 1.5, 2.9, 0.9, 3.9, 3.2, 2.1 and 1.9. Construct a 95% confidence interval for the true expected blood pressure increase for all patients in a population.
![Page 21: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/21.jpg)
CLT consequence• Change in a blood pressure is a biological process. It’s
going to be a sum of thousands or millions of microscopic processes.
• Generally, if we think about biological/physical process, they can be viewed as being affected by a large number of random subprocesses with individually small effects.
• The sum of all these random components creates a random variable that converges to a normal distribution regardless of the underlying distribution of processes causing the small effects.
• Thus, the Central Limit Theorem explains the ubiquity of the famous "Normal distribution" in the measurements domain.
![Page 22: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/22.jpg)
• We will assume that our population distribution is normal, with and .
• We don’t know anything about this distribution but we have a sample. Let’s figure out everything you can figure out about this sample: • ,
• We’ve been estimating the true population standard deviation with our sample standard deviation
• However, we are estimating our standard deviation with of only ! This is probably goint to be not so good estimate.
• In general, if this is considered a bad estimate.
![Page 23: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/23.jpg)
William Sealy Gosset aka Student• 1876-1937• an employee of Guinness
brewery• 1908 papers addressed the
brewer's concern with small samples• "The probable error of a mean".
Biometrika 6 (1): 1–25. March 1908.• Probable error of a correlation
coefficient". Biometrika 6 (2/3): 302–310. September 1908.
![Page 24: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/24.jpg)
Student t-distribution• Instead of assuming a sampling distribution is normal we
will use a Student t-distribution.• It gives a better estimate of your confidence interval if you
have a small sample size.• It looks very similar to a normal distribution, but it has
fatter tails to indicate the higher frequency of outliers which come with a small data set.
![Page 25: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/25.jpg)
Student t-distribution
![Page 26: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/26.jpg)
Student t-distribution
df – degree of freedom (stupeň volnosti)
![Page 27: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/27.jpg)
Back to our case
• Because a sample size is small, sampling distribution of the mean won’t be normal. Instead, it will have a Student t-distribution with .
• Construct a 95% confidence interval, please
for𝑛<30 :𝑥 ±𝑡𝑛−1×𝑠
√𝑛
![Page 28: summary](https://reader035.vdocuments.mx/reader035/viewer/2022070421/56816361550346895dd430b9/html5/thumbnails/28.jpg)
• Just to summarize, the margin of error depends on1. the confidence level (common is 95%)2. the sample size
• as the sample size increases, the margin of error decreases• For the bigger sample we have a smaller interval for which
we’re pretty sure the true population lies.
3. the variability of the data (i.e. on σ)• more variability increases the margin of error
• Margin of error does not measure anything else than chance variation.
• It doesn’t measure any bias or errors that happen during the proces.
• It does not tell anything about the correctness of your data!!!
neco× 𝑠√𝑛