lec 6, ch.5, pp90-105: statistics (objectives)
DESCRIPTION
Lec 6, Ch.5, pp90-105: Statistics (Objectives). Understand basic principles of statistics through reading these pages, especially… Know well about the normal distribution Know the special characteristics of the Poisson distribution Understand the meaning of correlation and dependence - PowerPoint PPT PresentationTRANSCRIPT
Lec 6, Ch.5, pp90-105: Statistics (Objectives)
Understand basic principles of statistics through reading these pages, especially…
Know well about the normal distribution Know the special characteristics of the Poisson
distribution Understand the meaning of correlation and dependence Understand what confidence intervals mean Learn how to estimate sample sizes for data collections Understand the concept of hypothesis testing
What we cover in class today… Anything not covered in class, you learn them from reading pp.95-105.
The normal distribution – how to read the standard normal distribution table
Central limit theory (CLT) The Poisson distribution – why it is relevant to
traffic engineering Correlation and dependence Confidence bounds and their implications Estimating sample sizes The concept of hypothesis testing
The normal distribution
z = (x - µ)/
= (65 – 55)/7
= 1.43
Mean = 55 mph
What’s the probability the next value will be less than 65 mph?
From the sample normal distribution to the standard normal distribution.
Use of the standard normal distribution table, Tab 5-1
Z = 1.43
Most popular one is 95% within µ ± 1.96
Central limit theorem (CLT)Definition: The population may have any unknown distribution with a mean µ and a finite variance of 2. Take samples of size n from the population. As the size of n increases, the distribution of sample means will approach a normal distribution with mean µ and a variance of 2/n.
F(x)
xµ
X distribution
X ~ any (µ, 2)
approaches
)(Xf
X
µ XX distribution
),(~ 2XNX
The Poisson distribution (“counting distribution” or “Random arrival”)
!)(
xemxXP
mx
With mean µ = m and variance 2 = m.
If the above characteristic is not met, the Poisson does not apply.
The binomial distribution tends to approach the Poisson distribution with parameter m = np. (See Table 4-3)
When time headways are exponentially distributed with mean = 1/, the number of arrivals in an interval T is Poisson distributed with mean = m = T.
Correlation and dependence
Independent variable x
Dep
end e
nt v
aria
ble
y
y = f(x)
Linear regression:
y = a + bx
Non-linear regression:
y = axb (example)
Correlation coefficient r (1, perfect fit)Coefficient of determination r2 (Tells you how much of variability can be “explained” by the independent variables.)
Confidence bounds and intervalPoint estimates: A point estimate is a single-values estimate of a population parameter made from a sample.
Interval estimates: An interval estimate is a probability statement that a population parameter is between two computed values (bounds).
µ
X
X
X – tas/sqrt(n) X + tas/sqrt(n)
- - True population mean
Point estimate of X from a sample
Two-sided interval estimate
Confidence interval (cont)
When n gets larger (n>=30), t can become z. The probability of any random variable being within 1.96 standard deviations of the mean is 0.95, written as:
P[(µ - 1.96) y (µ + 1.96)] = 0.95
Obviously we do not know µ and . Hence we restate this in terms of the distribution of sample means:
P[( x - 1.96E) y ( x + 1.96E)] = 0.95
Where, E = s/SQRT(n)(Review 1, 2, 3, and 4 in page 100.)
Estimating sample sizesFor cases in which the distribution of means can be considered normal, the confidence range for 95% confidence is:
ns96.1
If this value is called the tolerance (or “precision”), and given the symbol e, then the following equation can be solved for n, the desired sample size:
nse 96.1 and 2
2
84.3esn
By replacing 1.96 with z and 3.84 with z2, we can use this for any level of confidence.
(Review 1 and 2 on page 101.)
The concept of hypothesis testingTwo distinct choices:
Null hypothesis, H0
Alternative hypothesis: H1
E.g. Inspect 100,000 vehicles, of which 10,000 vehicles are “unsafe.” This is the fact given to us.
H0: The vehicle being tested is “safe.”
H1: The vehicle being tested is “unsafe.”
In this inspection,
15% of the unsafe vehicles are determined to be safe Type II error (bad error)
and 5% of the safe vehicles are determined to be unsafe Type I error (economically bad but safety-wise it is better than Type II error.)
Types of errors
Reality Decision
Reject H0 Accept H0
H0 is true
H1 is true
Type I error
Type II error
Correct
Correct
Reject a correct null hypothesis
Fail to reject a false null hypothesis
We want to minimize especially Type II error.
Steps of the Hypothesis Testing
State the hypothesis
Select the significance level
Compute sample statistics and estimate parameters
Compute the test statistic
Determine the acceptance and critical region of the test statistics
Reject or do not reject H0
P(type I error) = (level of significance) P(type II error ) =
Dependence between , , and sample size n
There is a distinct relationship between the two probability values and and the sample size n for any hypothesis. The value of any one is found by using the test statistic and set values of the other two.
Given and n, determine . Usually the and n values are the most crucial, so they are established and the value is not controlled.
Given and , determine n. Set up the test statistic for and with H0 value and an H1 value of the parameter and two different n values.
The t (or z) statistics is: t or zn
X
)(
(Use an example from a stat book)
One-sided and two-sided tests The significance of the hypothesis test is indicated by , the type I error probability. = 0.05 is most common: there is a 5% level of significance, which means that on the average a type I error (reject a true H0) will occur 5 in 100 times that H0 and H1 are tested. In addition, there is a 95% confidence level that the result is correct.
If H1 involves a not-equal relation, no direction is given, so the significance area is equally divided between the two tails of the testing distribution.
If it is known that the parameter can go in only one direction, a one-sided test is performed, so the significance area is in one tail of the distribution.
One-sided upper
Two-sided
0.025 each
0.05