1 tr 555 statistics “refresher” lecture 2: distributions and tests binomial, normal, log normal...
TRANSCRIPT
1
TR 555 Statistics “Refresher”Lecture 2: Distributions and Tests
Binomial, Normal, Log Normal distributions Chi Square and K.S. tests for goodness of fit
and independence Poisson and negative exponential Weibull distributions Test Statistics, sample size and Confidence
Intervals Hypothesis testing
WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!
WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!
2
Another good reference
http://www.itl.nist.gov/div898/handbook/index.htm
3
Another good reference
http://www.ruf.rice.edu/~lane/stat_sim/index.html
4
Bernoulli Trials
1 Only two possible outcomes on each trial (one is arbitrarily labeled success, the other failure)
2 The probability of a success = P(S) = p is the same for each trial
(equivalently, the probability of a failure = P(F) =
1-P(S) = 1- p is the same for each trial
3 The trials are independent
5
Binomial, A Probability Distribution
n = a fixed number of Bernoulli trials p = the probability of success in each trial X = the number of successes in n trials
The random variable X is called a binomial random variable. Its distribution is called a binomial distribution
6
The binomial distribution with n trials and success probability p is denoted by the equation
xnx ppx
nxXPxf
1
or
7
8
9
The binomial distribution with n trials and success probability p has
Mean =
Variance =
Standard deviation =
pnp 12
pnp 12
np
10
11
12
151050
0.4
0.3
0.2
0.1
0.0
C1
n=5
Binomial Distribution with p=.2, n=5
13
151050
0.3
0.2
0.1
0.0
C1
n=10
Binomial Distribution with p=.2, n=10
14
151050
0.2
0.1
0.0
C1
n=30
Binomial Distribution with p=.2, n=30
15
151050
0.4
0.3
0.2
0.1
0.0
C1
n=5
151050
0.3
0.2
0.1
0.0
C1
n=10
151050
0.2
0.1
0.0
C1
n=30
n=5
n=10
n=30
Binomial Distributions with p=.2
16
Transportation Example
The probability of making it safely from city A to city B is.9997 (do we generally know this?)
Traffic per day is 10,000 trips Assuming independence, what is the
probability that there will be more than 3 crashes in a day
What is the expected value of the number of crashes?
17
Transportation Example
Expected value = np = .0003*10000 = 3 P(X>3) = 1- [P (X=0) + P (X=1) + P (X=2) + P
(X=3)] e.g.,P (x=3) = 10000!/(3!*9997!) *.0003^3
* .9997^9997 = .224 don’t just hit 9997! On your calculator! P(X>3) = 1- [.050 + .149 + .224 + .224] =
65%
18
Continuous probabilitydensity functions
19
Continuous probabilitydensity functions
The curve describes probability of getting any range of values, say P(X > 120), P(X<100), P(110 < X < 120)
Area under the curve = probability Area under whole curve = 1 Probability of getting specific number is 0,
e.g. P(X=120) = 0
20
Histogram(Area of rectangle = probability)
55 75 95 115 135
0.00
0.01
0.02
IQ
Densi
ty
IQ
(Intervals of size 20)
21
Decrease interval size...
55 65 75 85 95 105 115 125 135
0.00
0.01
0.02
IQ
Densi
ty
IQ
(Intervals of size 10)
22
Decrease interval size more….
50 60 70 80 90 100 110 120 130 140
0.00
0.01
0.02
0.03
IQ
Den
sity
IQ
(Intervals of size 5)
23
Normal: special kind of continuous p.d.f
40 50 60 70 80 90 100
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
Den
sity
Bell-shaped curve
Mean = 70 SD = 5
Mean = 70 SD = 10
24
Normal distribution
25
26
Characteristics of normal distribution
Symmetric, bell-shaped curve. Shape of curve depends on population mean
and standard deviation . Center of distribution is . Spread is determined by . Most values fall around the mean, but some
values are smaller and some are larger.
27
Probability = Area under curve
Normal integral cannot be solved, so must be numerically integrated - tables
We just need a table of probabilities for every possible normal distribution.
But there are an infinite number of normal distributions (one for each and )!!
Solution is to “standardize.”
28
Standardizing
Take value X and subtract its mean from it, and then divide by its standard deviation . Call the resulting value Z.
That is, Z = (X- )/ Z is called the standard normal. Its mean
is 0 and standard deviation is 1. Then, use probability table for Z.
29
Using Z Table
-4 -3 -2 -1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
Z
De
nsi
ty
Standard Normal Curve
P(Z > z)Tail probability
30
Suppose we want to calculate
bXP
We can calculate
zZPbXP ][
b
z
And then use the fact that
We can find from our Z table zZP
),(~ NXwhere
31
Probability below 65?
55 65 75 85
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
Den
sity
P(X < 65)
32
zZPzZP 1
This is the area under the curve to the right of z.
zZP Suppose we wanted to calculate
The using the law of complements, we have
33
Probability above 75?
55 60 65 70 75 80 85
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
Den
sity
Probability student scores higher than 75?
P(X > 75)
34
aZPbZPbZaP
bZaP Now suppose we want to calculate
This is the area under the curve between a and b. We calculate this by first calculating the area to the left of b then subtracting the area to the left of a.
Key Formula!
35
Probability between 65 and 70?
55 60 65 70 75 80 85
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
Den
sity
P(65 < X < 70)
36
Transportation Example
Average speeds are thought to be normally distributed
Sample speeds are taken, with X = 74.3 and sigma = 6.9
What is the speed likely to be exceeded only 5% of the time?
Z95 = 1.64 (one tail) = (x-74.3)/6.9 x = 85.6 What % are obeying the 75mph speed limit within a
5MPH grace?
37
Assessing Normality
the normal distribution requires that the mean is approximately equal to the median, bell shaped, and has the possibility of negative values
Histograms Box plots Normal probability plots Chi Square or KS test of goodness of fit
38
Transforms:Log Normal
If data are not normal, log of data may be
If so, …
39
40
Example of Lognormal transform
41
Example of Lognormal transform
42
Chi Square Test
AKA cross-classification Non-parametric test Use for nominal scale data (or convert
your data to nominal scale/categories) Test for normality (or in general, goodness of fit) Test for independence(can also use Cramer’s coefficient for
independence or Kendall’s tau for ratio, interval or ordinal data)
if used it is important to recognize that it formally applies only to discrete data, the bin intervals chosen influence the outcome, and exact methods (Mehta) provide more reliable results particularly for small sample size
43
Chi Square Test
Tests for goodness of fit Assumptions
– The sample is a random sample.– The measurement scale is at least nominal– Each cell contains at least 5 observations– N observations– Break data into c categories– H0 observations follow some f(x)
44
Chi Square Test
Expected number of observations in any cell
The test statistic
Reject (not from the distribution of interest) if chi square exceeds table value at 1-α (c-1-w degrees of freedom, where w is the number of parameters to be estimated)
45
Chi Square Test
Tests independence of 2 variables Assumptions
– N observations– R categories for one variable– C categories for the other variable– At least 5 observations in each cell
Prepare an r x c contingency table H0 the two variables are independent
46
Chi Square Test
Expected number of observations in any cell
The test statistic
Reject (not independent) if chi square exceeds table value at 1-α distribution with (r - 1)(c - 1) degrees of freedom
47
48
Transportation Example
Number of crashes during a year
49
Transportation Example
50
Transportation Example
Adapted from Ang and Tang, 1975
51
K.S. Test for goodness of fit
Kolmogorov-Smirnov Non-parametric test Use for ratio, interval or ordinal scale data Compare experimental or observed data to a
theoretical distribution (CDF) Need to compile a CDF of your data (called
an EDF where E means empirical) OK for small samples
52
53
54
Poisson Distribution
the Poisson distribution requires that the mean be approximately equal to the variance
Discrete events, whole numbers with small values
Positive values e.g., number of crashes or
vehicles during a given time
55
Transportation Example #1
On average, 3 crashes per day are experienced on a particular road segment
What is the probability that there will be more than 3 crashes in a day
56
P(X>3) = 1- [P (X=0) + P (X=1) + P (X=2) + P (X=3)]
e.g.,P (x=3) = = .224
P(X>3) = 1- [.050 + .149 + .224 + .224] = 65% (recognize this number???)
57
Transportation Example #2
58
59
Negative Binomial Distribution
An “over-dispersed” Poisson Mean > variance Also used for crashes, other count data, especially
when combinations of poisson distributed data Recall binomial:
Negative binomial:
60
(Negative) Exponential Distribution
Good for inter-arrival time (e.g., time between arrivals or crashes, gaps)
Assumes Poisson counts P(no occurrence in time t) =
61
Transportation Example
In our turn bay design example, what is the probability that no car will arrive in 1 minute? (19%)
How many 7 second gaps are expected in one minute??? 82% chance that any 7 sec. Period has no car … 60/7*82%=7/minute
62
Weibull Distribution
Very flexible empirical model
63
Sampling Distributions
64
Sampling Distributions
Some Definitions Some Common Sense Things An Example A Simulation Sampling Distributions Central Limit Theorem
65
Definitions
• Parameter: A number describing a population
• Statistic: A number describing a sample
• Random Sample: every unit in the population has an equal probability of being included in the sample
• Sampling Distribution: the probability distribution of a statistic
66
Common Sense Thing #1
A random sample should represent the population well, so sample statistics from a random sample should provide reasonable estimates of population parameters
67
All sample statistics have some error in estimating population parameters
Common Sense Thing #2
68
Common Sense Thing #3
If repeated samples are taken from a population and the same statistic (e.g. mean) is calculated from each sample, the statistics will vary, that is, they will have a distribution
69
Common Sense Thing #4
A larger sample provides more information than a smaller sample so a statistic from a large sample should have less error than a statistic from a small sample
70
Distribution of when sampling from a normal distribution
has a normal distribution with mean = and standard deviation =
X x
nx
X
71
Central Limit Theorem
If the sample size (n) is large enough, has a normal distribution with
mean =
and
standard deviation =
regardless of the population distribution
X
x
nx
72
30n
73
Does have a normal distribution?X
Is the population normal?
is normal Is ?
may or may not beconsidered normal
X
is considered to benormal
X
30n
X
(We need more info)
Yes
Yes
No
No
74
Situation
Different samples produce different results. Value of a statistic, like mean or proportion,
depends on the particular sample obtained. But some values may be more likely than
others. The probability distribution of a statistic
(“sampling distribution”) indicates the likelihood of getting certain values.
75
Transportation Example
Speed is normally distributed with mean 45 MPH and standard deviation 6 MPH.
Take random samples of n = 4. Then, sample means are normally distributed
with mean 45 MPH and standard error 3 MPH [from 6/sqrt(4) = 6/2].
76
Using empirical rule...
68% of samples of n=4 will have an average speed between 42 and 48 MPH.
95% of samples of n=4 will have an average speed between 39 and 51 MPH.
99% of samples of n=4 will have an average speed between 36 and 54 MPH.
77
What happens if we take larger samples?
Speed is normally distributed with mean 45 MPH and standard deviation 6 MPH.
Take random samples of n = 36 . Then, sample means are normally distributed
with mean 45 MPH and standard error 1 MPH [from 6/sqrt(36) = 6/6].
78
Again, using empirical rule...
68% of samples of n=36 will have an average speed between 44 and 46 MPH.
95% of samples of n=36 will have an average speed between 43 and 47 MPH.
99% of samples of n=36 will have an average speed between 42 and 48 MPH.
So … the larger the sample, the less the sample averages vary.
79
Sampling Distributions for Proportions
Thought questions Basic rules ESP example Taste test example
80
Rule for Sample Proportions
81
Proportion “heads” in 50 tosses
Bell curve for possible proportions Curve centered at true proportion (0.50) SD of curve = Square root of [p(1-p)/n] SD = sqrt [0.5(1-0.5)/50] = 0.07 By empirical rule, 68% chance that a
proportion will be between 0.43 and 0.57
82
ESP example
Five cards are randomly shuffled A card is picked by the researcher Participant guesses which card This is repeated n = 80 times
83
Many people participate
Researcher tests hundreds of people Each person does n = 80 trials The proportion correct is calculated for each
person
84
Who has ESP?
What sample proportions go beyond luck? What proportions are within the normal
guessing range?
85
Possible results of ESP experiment
1 in 5 chance of correct guess If guessing, true p = 0.20 Typical guesser gets p = 0.20 SD of test = Sqrt [0.2(1-0.2)/80] = 0.035
86
Description of possible proportions
Bell curve Centered at 0.2 SD = 0.035 99% within 0.095 and 0.305 (+/- 3SD) If hundreds of tests, may find several (does it
mean they have ESP?)
87
Transportation Example
88
Concepts of Confidence Intervals
89
Confidence Interval
A range of reasonable guesses at a population value, a mean for instance
Confidence level = chance that range of guesses captures the the population value
Most common confidence level is 95%
90
General Format of a Confidence Interval
estimate +/- margin of error
91
Transportation Example: Accuracy of a mean
A sample of n=36 has mean speed = 75.3. The SD = 8 . How well does this sample mean estimate
the population mean ?
92
Standard Error of Mean
SEM = SD of sample / square root of n SEM = 8 / square root ( 36) = 8 / 6 = 1.33 Margin of error of mean = 2 x SEM Margin of Error = 2.66 , about 2.7
93
Interpretation
95% chance the sample mean is within 2.7 MPH of the population mean. (q. what is implication on enforcement of type I error? Type II?)
A 95% confidence interval for the population mean
sample mean +/- margin of error 75.3 +/-2.7 ; 72.6 to 78.0
94
For Large Population
Could the mean speed be 72 MPH ? Maybe, but our interval doesn't include
72. It's likely that population mean is above
72.
95
C.I. for mean speed at another location
n=49 sample mean=70.3 MPH, SD = 8 SEM = 8 / square root(49) = 1.1 margin of error=2 x 1.1 = 2.2 Interval is 70.3 +/- 2.2 68.1 to 72.5
96
Do locations 1 and 2 differ in mean speed?
C.I. for location 1 is 72.6 to 78.0 C.I. for location 2 is 68.1 to 72.5 No overlap between intervals Looks safe to say that population means
differ
97
Thought Question
Study compares speed reduction due to enforcement vs. education
95% confidence intervals for mean speed reduction• Cop on side of road : 13.4 to 18.0• Speed monitor only : 6.4 to 11.2
98
Part A
Do you think this means that 95% of locations with cop present will lower speed between 13.4 and 18.0 MPH?
Answer : No. The interval is a range of guesses at the population mean.
This interval doesn't describe individual variation.
99
Part B
Can we conclude that there's a difference between mean speed reduction of the two programs ?
This is a reasonable conclusion. The two confidence intervals don't overlap.
It seems the population means are different.
100
Direct look at the difference
For cop present, mean speed reduction = 15.8 MPH
For sign only, mean speed reduction = 8.8 MPH
Difference = 7 MPH more reduction by enforcement method
101
Confidence Interval for Difference
95% confidence interval for difference in mean speed reduction is 3.5 to 10.5 MPH.• Don't worry about the calculations.
This interval is entirely above 0. This rules out "no difference" ; 0
difference would mean no difference.
102
Confidence Interval for a Mean
when you have a “small” sample...
103
As long as you have a “large” sample….
A confidence interval for a population mean is:
n
sZx
where the average, standard deviation, and n depend on the sample, and Z depends on the confidence level.
104
Transportation Example
Random sample of 59 similar locations produces an average crash rate of 273.2. Sample standard deviation was 94.40.
09.2420.27359
4.9496.120.273
We can be 95% confident that the average crash rate was between 249.11 and 297.29
105
What happens if you can only take a “small” sample?
Random sample of 15 similar location crash rates had an average of 6.4 with standard deviation of 1.
What is the average crash rate at all similar locations?
106
If you have a “small” sample...
Replace the Z value with a t value to get:
n
stx
where “t” comes from Student’s t distribution, and depends on the sample size through the degrees of freedom “n-1”
Can also use the tau test for very small samples
107
Student’s t distribution versus Normal Z distribution
-5 0 5
0.0
0.1
0.2
0.3
0.4
Value
dens
ity
T-distribution and Standard Normal Z distribution
T with 5 d.f.
Z distribution
108
T distribution
Very similar to standard normal distribution, except:
t depends on the degrees of freedom “n-1” more likely to get extreme t values than
extreme Z values
109
110
111
Let’s compare t and Z values
Confidencelevel
t value with5 d.f
Z value
90% 2.015 1.65
95% 2.571 1.96
99% 4.032 2.58
For small samples, T value is larger than Z value. So,T interval is made to be longer than Z interval.
112
OK, enough theorizing!Let’s get back to our example!
Sample of 15 locations crash rate of 6.4 with standard deviation of 1.
55.04.615
1145.24.6
n
stx
Need t with n-1 = 15-1 = 14 d.f. For 95% confidence, t14 = 2.145
We can be 95% confident that average crash rate is between 5.85 and 6.95.
113
What happens as sample gets larger?
-5 0 5
0.0
0.1
0.2
0.3
0.4
Value
dens
ity
T-distribution and Standard Normal Z distribution
Z distribution
T with 60 d.f.
114
What happens to CI as sample gets larger?
n
sZx
n
stx
For large samples:
Z and t values become almost identical, so CIs are almost identical.
115
One not-so-small problem!
It is only OK to use the t interval for small samples if your original measurements are normally distributed.
We’ll learn how to check for normality in a minute.
116
Strategy for deciding how to analyze
If you have a large sample of, say, 60 or more measurements, then don’t worry about normality, and use the z-interval.
If you have a small sample and your data are normally distributed, then use the t-interval.
If you have a small sample and your data are not normally distributed, then do not use the t-interval, and stay tuned for what to do.
117
Hypothesis tests
Test should begin with a set of specific, testable hypotheses that can be tested using data:
– Not a meaningful hypothesis – Was safety improved by improvements to roadway
– Meaningful hypothesis – Were speeds reduced when traffic calming was introduced.
Usually to demonstrate evidence that there is a difference in measurable quantities
Hypothesis testing is a decision-making tool.
118
Hypothesis Step 1
Provide one working hypothesis – the null hypothesis – and an alternative
The null or nil hypothesis convention is generally that nothing happened
– Example speeds were not reduced after traffic calming – Null HypothesisSpeed were reduced after traffic calming – Alternative Hypothesis
When stating the hypothesis, the analyst must think of the impact of the potential error.
119
Step 2, select appropriate statistical test
The analyst may wish to test– Changes in the mean of events– Changes in the variation of events– Changes in the distribution of events
120
Step 3, Formulate decision rules and set levels for the probability of error
Accept Reject
Area where we incorrectly reject Type I error, referred to as significance level)
Area where we incorrectly acceptType II error, referred to as
121
Type I and II errors
lies in lies in acceptance interval rejection interval
Accept the No error Type II errorClaim
Reject the Type I error No errorclaim
122
Levels of and
Often is not considered in the development of the test.
There is a trade-off between and Over emphasis is placed on the level of significance
of the test. The level of should be appropriate for decision
being made.– Small values for decisions where errors cannot be tolerated
and errors are less likely– Larger values where type I errors can be more easily
tolerated
123
Step 4 Check statistical assumption
Draw new samples to check answerCheck the following assumption
– Are data continuous or discrete– Plot data– Inspect to make sure that data meets assumptions
For example, the normal distribution assumes that mean = median
– Inspect results for reasonableness
124
Step 5 Make decision
Typical misconceptions– Alpha is the most important error– Hypothesis tests are unconditional
It does not provide evidence that the working hypothesis is true
– Hypothesis test conclusion are correctAssume
– 300 independent tests– 100 rejection of work hypothesis = 0.05 and = 0.10– Thus 0.05 x 100 = 5 Type I errors– And 0.1 x 200 = 20 Type II errors– 25 time out of 300 the test results were wrong
125
Transportation Example
The crash rates below, in 100 million vehicle miles, were calculated for 50, 20 mile long segment of interstate highway during 2002
.34 .22 .40 .25 .31 .34 .26 .55 .43 .34
.31 .43 .28 .33 .23 .40 .39 .38 .21 .43
.20 .20 .36 .48 .36 .30 .27 .42 .27 .28
.43 .45 .38 .54 .39 .55 .25 .35 .39 .43
.26 .17 .30 .40 .16 .32 .34 .46 .37 .33
0.1 - 0.2
0.2 - 0.3
0.3 - 0.4
0.4 - 0.5
0.5-0.6
Crash Rates
0
5
10
15
20
25
Frequency of Crash Rates by Sections
x = 0.35 s2= 0.0090 s = 0.095
126
Example continued
Crash rates are collected from non-interstate system highways built to slightly lower design standards. Similarly and average crash rate is calculated and it is greater (0.53). Also assume that both means have the same standard deviation (0.095). The question is do we arrive at the same accident rate with both facilities. Our hypothesis is that both have the same means f = nf Can we accept or reject our hypothesis?
127
Is this part of the crash rate distribution for interstate highways
Example continued
ccept Reject
Area where we incorrectly reject Type I error
Area where we incorrectly acceptType II error
128
Example continued
Lets set the probability of a Type I error at 5%– set (upper boundary - 0.35)/ 0.095 =1.645 (one tail
Z (cum) for 95%)– Upper boundary = 0.51– Therefore, we reject the hypothesis– What’s the probability of a Type II error?– (0.51 – 0.53)/0.095 = -0.21– 41.7%– There is a 41.7% chance of what?
129
The P value … example: Grade inflation
H0: μ = 2.7HA: μ > 2.7
Random sampleof students
X
n = 36
s = 0.6
and
Data
Decision RuleSet significance level α = 0.05. If p-value 0.05, reject null hypothesis.
130
Example: Grade inflation?Population of
5 million college students
Is the average GPA 2.7?
Sample of 100 college students
How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?
131
The p-value illustrated
How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?
132
Determining the p-value
H0: μ = average population GPA = 2.7HA: μ = average population GPA > 2.7
If 100 students have average GPA of 2.9 with standard deviation of 0.6, the P-value is:
0004.0]33.3[)]100/6.0/()7.29.2([)9.2(
ZPZPXP
133
Making the decision
The p-value is “small.” It is unlikely that we would get a sample as large as 2.9 if the average GPA of the population was 2.7.
Reject H0. There is sufficient evidence to conclude that the average GPA is greater than 2.7.
134
Terminology
H0: μ = 2.7 versus HA: μ > 2.7 is called a “right-tailed” or a “one-sided” hypothesis test, since the p-value is in the right tail.
Z = 3.33 is called the “test statistic”. If we think our p-value small if it is less than
0.05, then the probability that we make a Type I error is 0.05. This is called the “significance level” of the test. We say, α=0.05, where α is “alpha”.
135
Alternative Decision Rule
“Reject if p-value 0.05” is equivalent to “reject if the sample average, X-bar, is larger than 2.865”
X-bar > 2.865 is called “rejection region.”
136
Minimize chance of Type I error...
… by making significance level small. Common values are = 0.01, 0.05, or 0.10. “How small” depends on seriousness of Type
I error. Decision is not a statistical one but a
practical one (set alpha small for safety analysis, larger for traffic congestion, say)
137
Type II Error and Power
“Power” of a test is the probability of rejecting null when alternative is true.
“Power” = 1 - P(Type II error) To minimize the P(Type II error), we
equivalently want to maximize power. But power depends on the value under the
alternative hypothesis ...
138
Type II Error and Power
(Alternative is true)
139
Factors affecting power...
Difference between value under the null and the actual value
P(Type I error) = Standard deviation Sample size
140
Strategy for designing a good hypothesis test
Use pilot study to estimate std. deviation. Specify . Typically 0.01 to 0.10. Decide what a meaningful difference would
be between the mean in the null and the actual mean.
Decide power. Typically 0.80 to 0.99. Simple to use software to determine sample
size …
141
How to determine sample size
Depends on experiment
Basically, use the formulas and let sample size be the factor you want to determine
Vary the confidence interval, alpha and beta
http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html
142
If sample is too small ...
… the power can be too low to identify even large meaningful differences between the null and alternative values. – Determine sample size in advance of conducting
study.– Don’t believe the “fail-to-reject-results” of a study
based on a small sample.
143
If sample is really large ...
… the power can be extremely high for identifying even meaningless differences between the null and alternative values.– In addition to performing hypothesis tests, use a
confidence interval to estimate the actual population value.
– If a study reports a “reject result,” ask how much different?
144
The moral of the storyas researcher
Always determine how many measurements you need to take in order to have high enough power to achieve your study goals.
If you don’t know how to determine sample size, ask a statistical consultant to help you.
145
Important “Boohoo!” Point
Neither decision entails proving the null hypothesis or the alternative hypothesis.
We merely state there is enough evidence to behave one way or the other.
This is also always true in statistics! No matter what decision we make, there is always a chance we made an error.
Boohoo!
146
Comparing the Means of Two Dependent Populations
The Paired T-test ….
147
Assumptions: 2-Sample T-Test
Data in each group follow a normal distribution.
For pooled test, the variances for each group are equal.
The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).
148
What happens if samples aren’t independent?
That is, they are“dependent” or “correlated”?
149
Do signals with all-red clearance phases have lower numbers of crashes than those without?
All-Red No All-Red 60 32 32 44 80 22 50 40
Sample Average: 55.5 34.5
Real question is whether intersections with similar traffic volumes have different numbers of crashes. Better then to compare the difference in crashes in “pairs” of intersections with and without all-red clearance phases.
150
Now, a Paired Study
CrashesTraffic No all-red All-red DifferenceLow 22 20 2.0Medium 29 28 1.0Med-high 35 32 3.0High 80 78 2.0Averages 41.5 39.5 2.0St. Dev 26.1 26.1 0.816
P-value = How likely is it that a paired sample would have a difference as large as 2 if the true difference were 0? (Ho = no diff.) - Problem reduces to a One-Sample T-test on differences!!!!
151
The Paired-T Test Statistic
If:• there are n pairs• and the differences are normally distributed
nd
sd
μd
sdifferenceoferrorstandard
differenceedhypothesizdifferencesamplet
d
Then:The test statistic, which follows a t-distribution with n-1 degrees of freedom, gives us our p-value:
152
The Paired-T Confidence Interval
If:• there are n pairs• and the differences are normally distributed
nd
std
Then:The confidence interval, with t following t-distribution with n-1 d.f. estimates the actual population difference:
153
Data analyzed as 2-Sample T
Two sample T for No all-red vs All-red
N Mean StDev SE MeanNo 4 41.5 26.2 13All 4 39.5 26.1 13
95% CI for mu No - mu All: ( -43, 47)T-Test mu No = mu All (vs not =): T = 0.11
P = 0.92 DF = 6Both use Pooled StDev = 26.2
P = 0.92. Do not reject null. Insufficient evidence to conclude that there is a real difference.
154
Data analyzed as Paired T
Paired T for No all-red vs All-red
N Mean StDev SE MeanNo 4 41.5 26.2 13.1All 4 39.5 26.1 13.1Difference 4 2.000 0.816 0.408
95% CI for mean difference: (0.701, 3.299)T-Test of mean difference = 0 (vs not = 0): T-Value = 4.90 P-Value = 0.016
P = 0.016. Reject null. Sufficient evidence to conclude that there IS a difference.
155
What happened?
P-value from two-sample t-test is just plain wrong. (Assumptions not met.)
We removed or “blocked out” the extra variability in the data due to differences in traffic, thereby focusing directly on the differences in crashes.
The paired t-test is more “powerful” because the paired design reduces the variability in the data.
156
Ways Pairing Can Occur
When subjects in one group are “matched” with a similar subject in the second group.
When subjects serve as their own control by receiving both of two different treatments.
When, in “before and after” studies, the same subjects are measured twice.
157
If variances of the measurements of the two groups are not equal...
Estimate the standard error of the difference as:
2
2
1
1ns
ns 22
Then the sampling distribution is an approximate t distribution with a complicated formula for d.f.
158
If variances of the measurements of the two groups are equal...
Estimate the standard error of the difference using the common pooled variance:
21
n1
n1s2
p
Then the sampling distribution is a t distribution with n1+n2-2 degrees of freedom.
where 2nn1)s(n1)s(n
s21
2211p
222
Assume variances are equal only if neither sample standard deviation is more than twice that of the other sample standard deviation.
159
Assumptions for correct P-values
Data in each group follow a normal distribution.
If use pooled t-test, the variances for each group are equal.
The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).
160
Interpreting a confidence interval for the difference in two means…
If the confidenceinterval contains… then, we conclude …
zero the two means maynot differ
only positivenumbers
first mean is largerthan second mean
only negativenumbers
first mean is smallerthan second mean
161
Difference in variance
Use F distribution test Compute F = s1^2/s2^2 Largest sample variance on
top Look up in F table with n1
and n2 DOF Reject that the variance is
the same if f>F If used to test if a model is
the same (same coefficients) during 2 periods, it is called the Chow test
162
163
164
165
Experiments and pitfalls
Types of safety experiments– Before and after tests– Cross sectional tests
Control sampleModified sampleSimilar or the same condition must take place for both
samples
166
Regression to the mean
This problem plagues before and after tests– Before and after tests require less data and
therefore are more popular
Because safety improvements are driven abnormally high crash rates, crash rates are likely to go down whether or not an improvement is made.
167
San Francisco intersection crash data
No. ofIntersectionsWith Given
No. ofAccidents in
1974 - 76
Accidents/IntersectionIn 1974 - 76
Accident/Year/
Intersectionin 1974-76
Accidents/Year in
1976 – 76For Group(rounded)
Accidentsin 1977 for
Group
Accidents/Intersection
In 1977
% Change
256 0 0 0 64 0.25 LargeIncrease
218 1 0.33 72 120 0.55 67%173 2 0.67 116 121 0.7 Small
Increase121 3 1 121 126 1.04 Small
Increase97 4 1.33 129 105 1.08 -19%70 5 1.67 117 93 1.33 -20%54 6 2 108 84 1.56 -22%32 7 2.33 75 72 2.25 -3%29 8 2.67 77 47 1.62 -39%
168
Spill over and migration impacts
Improvements, particularly those that are expect to modify behavior should be expected to spillover impacts at other locations. For example, suppose that red light running cameras were installed at several locations throughout a city. Base on the video evidence, this jurisdiction has the ability to ticket violators and, therefore, less red light running is suspected throughout the system – leading to spill over impacts. A second data base is available for a similar control set of intersections that are believed not to be impacted by spill over. The result are listed below:
Crashes at Treated Sites
Crashes at Comparison
Sites (no spill over)
Before 100 140After 64 112
169
Spill over impact
Assuming that spill over has occurred at the treated sites, the reduction in accidents that would have occurred after naturally is 100 x 112/140 = 80 had the intersection remained untreated. therefore, the reduction is really from 80 crashes to 64 crashes (before vs after) or 20% rather than 100 crashes to 64 crashes or 36%.
170
Spurious correlations
During the 1980s and early 1990s the Japanese economy was growing at a much greater rate than the U.S. economy. A professor on loan to the federal reserved wrote a paper on the Japanese economy and correlated the growth in the rate of Japanese economy and their investment in transportation infrastructure and found a strong correlation. At the same time, the U.S. invests a much lower percentage of GDP in infrastructure and our GNP was growing at a much lower rate. The resulting conclusion was that if we wanted to grow the economy we would invest like the Japanese in public infrastructure. The Association of Road and Transportation Builders of America (ARTBA) loved his findings and at the 1992 annual meeting of the TRB the economist from Bates college professor won an award. In 1992 the Japanese economy when in the tank, the U.S. economy started its longest economic boom.
171
Spurious Correlation Cont.
What is going on here?
What is the nature of the relationship between transportation investment and economic growth?