chapter 1 overview and descriptive statistics 1111.1 - populations, samples and processes 1111.2 -...
TRANSCRIPT
Chapter 1Overview and Descriptive Statistics
1.1 - Populations, Samples and Processes 1.2 - Pictorial and Tabular Methods in
Descriptive Statistics
1.3 - Measures of Location
1.4 - Measures of Variability
Note that these are textbook chapters, although Lecture Notes may be referenced.
STATISTICS IN A NUTSHELL
Examples: Toasting time, Temperature settings, etc. of a population of toasters…
4
What is “random variation” in the distribution of a population?
POPULATION 1: Little to no variation
O O O O O
In engineering situations such as this, we try to maintain “quality control”… i.e., “tight tolerance levels,” high precision, low variability.
But what about a population of, say, people?
(e.g., product manufacturing)
Density
POPULATION 1: Little to no variation
5
Most individual values ≈ population mean value
Example: Body Temperature (F)
Very little variation about the mean!
98.6 F
(e.g., clones)
What is “random variation” in the distribution of a population?
Example: Body Temperature (F)Examples: Gender, Race, Age, Height, Annual Income,…POPULATION 2: Much variation (more common)
Density
6
Much more variation about the
mean!
What is “random variation” in the distribution of a population?
• Click on image for full .pdf article
• Links in article to access datasets
Example
How is this accomplished?How is this accomplished?Hospital records, etc.
“sampling frame”
Women in U.S. who have given birth
POPULATION
“Random Variable” X = Age at first birth
mean μ = ???
That is, the Population Distribution of X ~ N(, ).
Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population.
Study Question:How can we estimate
“mean age at first birth” of women in the U.S.?
{x1, x2, x3, x4, … , x400}
and are “population characteristics” i.e., “parameters”(fixed, unknown)
standard deviation
σ
That is, the Population Distribution of X ~ N(, ).
Women in U.S. who have given birth
POPULATION
“Random Variable” X = Age at first birth
mean x = 25.6{x1, x2, x3, x4, … , x400} FORMULA
Study Question:How can we estimate
“mean age at first birth” of women in the U.S.?
Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population.
mean μ = ???
is an example of a “sample characteristic” = “statistic.”(numerical info culled from a sample)This is called a “point estimate“ of from the one sample.Can it be improved, and if so, how?• Choose a bigger sample, which
should reduce “variability.”• Average the sample means of
many samples, not just one. (introduces “sampling variability”)
“Sampling Distribution” ~ ???
How big???
?????????
??? and are “population characteristics” i.e., “parameters”(fixed, unknown)
x = 25.6
Other possible parameters:• standard deviation
• median • minimum
• maximum
standard deviation
σ
mean x = 25.6
mean x = 25.6
Without knowing every value in the population, it is not possible to determine the exact value of with 100% “certainty.”
HOWEVER…
That is, the Population Distribution of X ~ N(, ).
Women in U.S. who have given birth
POPULATION
“Random Variable” X = Age at first birth
mean x = 25.6{x1, x2, x3, x4, … , x400} FORMULA
Study Question:How can we estimate
“mean age at first birth” of women in the U.S.?
Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population.
mean μ = ???
For concreteness, suppose = 1.5
and are “population characteristics” i.e., “parameters”(fixed, unknown)
standard deviation
σ
95% CONFIDENCE INTERVAL FOR µ
25.74725.453
BASED ON OUR SAMPLE DATA, the true value of μ is between 25.453 and 25.747, with 95% “confidence” (…akin to
“probability”).
Without knowing every value in the population, it is not possible to determine the exact value of with 100% “certainty.”
HOWEVER…
This is called an “interval estimate“ of from the sample.
μ
Used in “Statistical Inference” via “Hypothesis Testing”…
(Stat 312)
mean x = 25.6
Women in U.S. who have given birth
POPULATION
“Random Variable” X = Age at first birth
mean x{x1, x2, x3, x4, … , xn} FORMULA
Study Question:How can we estimate
“mean age at first birth” of women in the U.S.?
Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population.
mean μ = ???
That is, the Population Distribution of X ~ N(, ).
and are “population characteristics” i.e., “parameters”(fixed, unknown)
standard deviation
σ
• Arithmetic Mean
• Geometric Mean
• Harmonic Mean
Each of these gives an estimate of for a particular sample.
Any general sample estimator for is denoted by the symbol
Likewise for and
1 2 nA
x x xx
n
1 2n
G nx x x x
1 2
1 1 1n
Hx x x
nx
ˆ .
ˆ .
Women in U.S. who have given birth
POPULATION
“Random Variable” X = Age at first birth
mean x{x1, x2, x3, x4, … , xn} FORMULA
Study Question:How can we estimate
“mean age at first birth” of women in the U.S.?
Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population.
mean μ = ???
That is, the Population Distribution of X ~ N(, ).“PARAMETER ESTIMATION”
and are “population characteristics” i.e., “parameters”(fixed, unknown)
standard deviation
σ
Extending these ideas to other parameters of a population gives rise to the general theory of…
(Stat 311)
and are “population characteristics” i.e., “parameters”(fixed, unknown)
standard deviation
σ
That is, the Population Distribution of X ~ N(, ).That is, the Population Distribution of X ~ N(, ).
How is…“Random Variable” X(age, income level, …)
… distributed?
Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population.
mean μ = ???
composed of “units” (people, rocks, toasters,...)
What do we want to know about this population?
Suppose we know that X follows a known “probability distribution” in the population… but with parameters unknown vals.1 2, , That is, the Population Distribution of X ~ Dist(1, 2,…).
Ideal properties…• Unbiased estimator of • Minimum Variance among all such unbiased estimators
i.e., “MVUE”
heavily skewed tail
SAMPLEFor a particular , want to define a corresponding “parameter estimator”
To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite).
POPULATION
How do we estimate these?
10 10½ 11
Quantitative [measurement] length
mass
temperature
pulse rate
# puppies
shoe size
16
“Random Variable”
X = any numerical value that can be assigned to each unit of a population
“Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.
There are two general types.........
Quantitative and Qualitative
How is…“Random Variable” X(age, income level, …)
… distributed?
What do we want to know about this population?
composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite).
POPULATION
Quantitative [measurement]
length
mass
temperature
pulse rate
# puppies
shoe size
17
“Random Variable”
X = any numerical value that can be assigned to each unit of a population
“Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.
There are two general types.........
Quantitative and Qualitative
How is…“Random Variable” X(age, income level, …)
… distributed?
What do we want to know about this population?
composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite).
POPULATION
CONTINUOUS(can take their values at any point in a continuous interval)
DISCRETE(only take their values in disconnected jumps)
Qualitative [categorical] video game levels (1, 2, 3,...)
income level (low, mid, high)
zip code
PIN #
color (Red, Green, Blue)
ORDINAL,RANKED
18
“Random Variable”
X = any numerical value that can be assigned to each unit of a population
“Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.
There are two general types.........
Quantitative and Qualitative
How is…“Random Variable” X(age, income level, …)
… distributed?
What do we want to know about this population?
composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite).
POPULATION
IMPORTANT SPECIAL CASE: Binary (or Dichotomous)• “Pregnant?” (Yes / No)• Coin toss (Heads / Tails)• Treatment (Drug / Placebo)
1 2 3NOMINAL
1 2 3
1, "Success"0, "Failure"
X
(ordered labels)
(unordered labels)
Random VariableDiscrete Random Variable
1, "Success"0, "Failure"
Y
Define a new parameter
= P(Success)
ˆ ? Point estimatorSuppose we intend to select a random sample of size n from this population of Success and Failures…
… in such a way that the “Success or Failure” outcome of any selected individual conveys no information about the “Success or Failure” outcome of any other selected individual.That is, the “Success or Failure” outcomes between any two individuals are independent. (Think of tossing a coin n times.)
POPULATION
Then a natural estimator for could be
(0, 1, 2, …, n)Let X = “Number of Successes
in the sample.”
the sample proportion of Success
Xn
Ex: n = 500 tosses, X= 285 Heads
285ˆ 0.57500