y i 2 n 2 i 1 y 1 review - order of operations samples and ...bio300/notes/03samples4x.pdfreview -...
TRANSCRIPT
-
Samples and populations
Estimating with uncertainty
Review - order of operations
!
s2
=n
n "1
#
$ %
&
' (
Yi
2
i=1
n
)
n"Y 2
#
$
% % % %
&
'
( ( ( (
Review - order of operations
1. Parentheses
2. Exponents and roots
3. Multiply and divide
4. Add and subtract
Review - order of operations
!
s2
=n
n "1
#
$ %
&
' (
Yi
2
i=1
n
)
n"Y 2
#
$
% % % %
&
'
( ( ( (
-
Review - order of operations
!
s2
=n
n "1
#
$ %
&
' (
Yi
2
i=1
n
)
n"Y 2
#
$
% % % %
&
'
( ( ( (
Review - types of variables
• Categorical variables
– For example, country of birth
• Numerical variables
– For example, student height
Review - types of variables
• Categorical variables
• Numerical variables
Discrete
Continuous
Review - types of variables
• Categorical variables
• Numerical variables
Discrete
Continuous
Nominal
Ordinal
-
Review - types of variables
• Categorical variables
– Nominal - no natural order
– Ordinal - can be placed in an order
Review - types of variables
• Categorical variables
– Nominal - no natural order
• Example - country of birth
– Ordinal - can be placed in an order
Review - types of variables
• Categorical variables
– Nominal - no natural order
• Example - country of birth
– Ordinal - can be placed in an order
• Example - educational experience
– Some high school, high school diploma, some college,
college degree, masters degree, PhD
Sampling from a population
• We often sample from a population
• Consider random samples
– Each individual has an equal and identical
probability of being selected
-
Body mass of 400 humans
Random sample of 10 people
-
Population mean:µ = 70.8 kg
Population mean:µ = 70.8 kg
Sample mean:x = 76.7 kg
Another sample…
-
Population mean:µ = 70.8 kg
Sample mean:x = 69.2 kg
What if we do this many times?
Example: gene length
n = 20,290
-
n = 20,290µ = 2622.0! = 2037.9
Sample histogram
n = 100Y = 2675.4s = 1539.2
Y = 2675.4s = 1539.2
Y = 2588.8s = 1620.5
Y = 2702.4s = 1727.1
Y = 2767.2s = 2044.7
-
Y = 2675.4s = 1539.2
Y = 2588.8s = 1620.5
Y = 2702.4s = 1727.1
Y = 2767.2s = 2044.7
Sampling distribution of the mean
1000 samples
Sampling distribution of the mean
Sampling distribution of the mean
Sampling distribution of the mean
-
µ = 2622.0
Mean of means:2626.4
Sampling distribution of the mean
Y = 2675.4s = 1539.2
Y = 2588.8s = 1620.5
Y = 2702.4s = 1727.1
Y = 2767.2s = 2044.7
s = 1539.2
s = 1620.5s = 1727.1
s = 2044.7
Sampling distribution of the standard deviation Sampling distribution of the standard deviation
-
100 samplesPopulation ! = 2036.9
Mean sample s = 1962.6
Sampling distribution of the standard deviation
1000 samplesPopulation ! = 2036.9
Mean sample s = 1929.7
Sampling distribution of the standard deviation
Sampling distribution of the mean, n=10
Sampling distribution of the mean, n=100
Sampling distribution of the mean, n = 1000
Sampling distribution of the mean, n=10
Sampling distribution of the mean, n=100
Sampling distribution of the mean, n = 1000
-
Precise Imprecise
Biased
Unbiased
Precise Imprecise
Biased
Unbiased
Larger sample size
Group activity #2
• Form groups of size 2-5
• Get out a blank sheet of paper
• Write everyone’s full name on the paper
-
How many toes do aliens have?
Instructions
• You have measurements from a population of400 aliens
• Use your random number table to select a sampleof ten measurements
• Calculate your sample mean and, if you have acalculator or a large brain, your sample standarddeviation
• On your paper, answer the following:
1. What was your sample mean and standard deviation?
2. How did you randomly choose your sample?
-
Distribution of the sample mean
• No matter what the frequency distribution
of the population:
• The sample mean has an approximately
bell-shaped (normal) distribution
• Especially for large n (large samples)
How precise is any one estimated
sample mean?
The standard error of anestimate is the standarddeviation of its samplingdistribution. The standard
error predicts the samplingerror of the estimate.
Standard error of the mean
!
" µ ="
n
-
Estimate of the standard error of
the mean
!
SEY
=s
n
Confidence interval
• Confidence interval
– a range of values surrounding the sample
estimate that is likely to contain the population
parameter
• 95% confidence interval
– plausible range for a parameter based on the
data
The 2SE rule-of-thumb
The interval from Y ! 2Y
SE to Y + 2Y
SE
provides a rough estimate of the 95% confidence interval
for the mean.
Confidence interval
-
Pseudoreplication
The error that occurs when samples are notindependent, but they are treated as though they are.
Example: “The transylvania effect”
A study of 130,000 calls for police assistancein 1980 found that they were more likely thanchance to occur during a full moon.
Example: “The transylvania effect”
A study of 130,000 calls for police assistancein 1980 found that they were more likely thanchance to occur during a full moon.
Problem: There may have been 130,000calls in the data set, but there were only 13full moons in 1980. These data are notindependent.