1 tr 555 statistics “refresher” lecture 2: distributions and tests binomial, normal, log normal...

1

TR 555 Statistics “Refresher”Lecture 2: Distributions and Tests

Binomial, Normal, Log Normal distributions Chi Square and K.S. tests for goodness of fit

and independence Poisson and negative exponential Weibull distributions Test Statistics, sample size and Confidence

Intervals Hypothesis testing

WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!

WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!

2

Another good reference

http://www.itl.nist.gov/div898/handbook/index.htm

3

Another good reference

http://www.ruf.rice.edu/~lane/stat_sim/index.html

4

Bernoulli Trials

1 Only two possible outcomes on each trial (one is arbitrarily labeled success, the other failure)

2 The probability of a success = P(S) = p is the same for each trial

(equivalently, the probability of a failure = P(F) =

1-P(S) = 1- p is the same for each trial

3 The trials are independent

5

Binomial, A Probability Distribution

n = a fixed number of Bernoulli trials p = the probability of success in each trial X = the number of successes in n trials

The random variable X is called a binomial random variable. Its distribution is called a binomial distribution

6

The binomial distribution with n trials and success probability p is denoted by the equation

xnx ppx

nxXPxf

1

or

9

The binomial distribution with n trials and success probability p has

Mean =

Variance =

Standard deviation =

pnp 12

pnp 12

np

12

151050

0.4

0.3

0.2

0.1

0.0

C1

n=5

Binomial Distribution with p=.2, n=5

13

151050

0.3

0.2

0.1

0.0

C1

n=10


14

151050

0.2

0.1

0.0

C1

n=30


15

151050

0.4

0.3

0.2

0.1

0.0

C1

n=5

151050

0.3

0.2

0.1

0.0

C1

n=10

151050

0.2

0.1

0.0

C1

n=30

n=5

n=10

n=30

Binomial Distributions with p=.2

16

Transportation Example

The probability of making it safely from city A to city B is.9997 (do we generally know this?)

Traffic per day is 10,000 trips Assuming independence, what is the

probability that there will be more than 3 crashes in a day

What is the expected value of the number of crashes?

17


Expected value = np = .0003*10000 = 3 P(X>3) = 1- [P (X=0) + P (X=1) + P (X=2) + P

(X=3)] e.g.,P (x=3) = 10000!/(3!*9997!) *.0003^3

* .9997^9997 = .224 don’t just hit 9997! On your calculator! P(X>3) = 1- [.050 + .149 + .224 + .224] =

65%

18

Continuous probabilitydensity functions

19

Continuous probabilitydensity functions

The curve describes probability of getting any range of values, say P(X > 120), P(X<100), P(110 < X < 120)

Area under the curve = probability Area under whole curve = 1 Probability of getting specific number is 0,

e.g. P(X=120) = 0

20

Histogram(Area of rectangle = probability)

55 75 95 115 135

0.00

0.01

0.02

IQ

Densi

ty

IQ

(Intervals of size 20)

21

Decrease interval size...

55 65 75 85 95 105 115 125 135

0.00

0.01

0.02

IQ

Densi

ty

IQ


22

Decrease interval size more….

50 60 70 80 90 100 110 120 130 140

0.00

0.01

0.02

0.03

IQ

Den

sity

IQ


23

Normal: special kind of continuous p.d.f

40 50 60 70 80 90 100

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

Den

sity

Bell-shaped curve

Mean = 70 SD = 5

Mean = 70 SD = 10

24

Normal distribution

26

Characteristics of normal distribution

Symmetric, bell-shaped curve. Shape of curve depends on population mean

and standard deviation . Center of distribution is . Spread is determined by . Most values fall around the mean, but some

values are smaller and some are larger.

27

Probability = Area under curve

Normal integral cannot be solved, so must be numerically integrated - tables

We just need a table of probabilities for every possible normal distribution.

But there are an infinite number of normal distributions (one for each and )!!

Solution is to “standardize.”

28

Standardizing

Take value X and subtract its mean from it, and then divide by its standard deviation . Call the resulting value Z.

That is, Z = (X- )/ Z is called the standard normal. Its mean

is 0 and standard deviation is 1. Then, use probability table for Z.

29

Using Z Table

-4 -3 -2 -1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

Z

De

nsi

ty

Standard Normal Curve

P(Z > z)Tail probability

30

Suppose we want to calculate

bXP

We can calculate

zZPbXP ][

b

z

And then use the fact that

We can find from our Z table zZP

),(~ NXwhere

31

Probability below 65?

55 65 75 85

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

Den

sity

P(X < 65)

32

zZPzZP 1

This is the area under the curve to the right of z.

zZP Suppose we wanted to calculate

The using the law of complements, we have

33

Probability above 75?

55 60 65 70 75 80 85

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

Den

sity

Probability student scores higher than 75?

P(X > 75)

34

aZPbZPbZaP

bZaP Now suppose we want to calculate

This is the area under the curve between a and b. We calculate this by first calculating the area to the left of b then subtracting the area to the left of a.

Key Formula!

35

Probability between 65 and 70?

55 60 65 70 75 80 85

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

Den

sity

P(65 < X < 70)

36


Average speeds are thought to be normally distributed

Sample speeds are taken, with X = 74.3 and sigma = 6.9

What is the speed likely to be exceeded only 5% of the time?

Z95 = 1.64 (one tail) = (x-74.3)/6.9 x = 85.6 What % are obeying the 75mph speed limit within a

5MPH grace?

37

Assessing Normality

the normal distribution requires that the mean is approximately equal to the median, bell shaped, and has the possibility of negative values

Histograms Box plots Normal probability plots Chi Square or KS test of goodness of fit

38

Transforms:Log Normal

If data are not normal, log of data may be

If so, …

40

Example of Lognormal transform

41

Example of Lognormal transform

42

Chi Square Test

AKA cross-classification Non-parametric test Use for nominal scale data (or convert

your data to nominal scale/categories) Test for normality (or in general, goodness of fit) Test for independence(can also use Cramer’s coefficient for

independence or Kendall’s tau for ratio, interval or ordinal data)

if used it is important to recognize that it formally applies only to discrete data, the bin intervals chosen influence the outcome, and exact methods (Mehta) provide more reliable results particularly for small sample size

43

Chi Square Test

Tests for goodness of fit Assumptions

– The sample is a random sample.– The measurement scale is at least nominal– Each cell contains at least 5 observations– N observations– Break data into c categories– H0 observations follow some f(x)

44

Chi Square Test

Expected number of observations in any cell

The test statistic

Reject (not from the distribution of interest) if chi square exceeds table value at 1-α (c-1-w degrees of freedom, where w is the number of parameters to be estimated)

45

Chi Square Test

Tests independence of 2 variables Assumptions

– N observations– R categories for one variable– C categories for the other variable– At least 5 observations in each cell

Prepare an r x c contingency table H0 the two variables are independent

46

Chi Square Test

Expected number of observations in any cell

The test statistic

Reject (not independent) if chi square exceeds table value at 1-α distribution with (r - 1)(c - 1) degrees of freedom

48


Number of crashes during a year

49


50


Adapted from Ang and Tang, 1975

51

K.S. Test for goodness of fit

Kolmogorov-Smirnov Non-parametric test Use for ratio, interval or ordinal scale data Compare experimental or observed data to a

theoretical distribution (CDF) Need to compile a CDF of your data (called

an EDF where E means empirical) OK for small samples

54

Poisson Distribution

the Poisson distribution requires that the mean be approximately equal to the variance

Discrete events, whole numbers with small values

Positive values e.g., number of crashes or

vehicles during a given time

55

Transportation Example #1

On average, 3 crashes per day are experienced on a particular road segment

What is the probability that there will be more than 3 crashes in a day

56

P(X>3) = 1- [P (X=0) + P (X=1) + P (X=2) + P (X=3)]

e.g.,P (x=3) = = .224

P(X>3) = 1- [.050 + .149 + .224 + .224] = 65% (recognize this number???)

57

Transportation Example #2

59

Negative Binomial Distribution

An “over-dispersed” Poisson Mean > variance Also used for crashes, other count data, especially

when combinations of poisson distributed data Recall binomial:

Negative binomial:

60

(Negative) Exponential Distribution

Good for inter-arrival time (e.g., time between arrivals or crashes, gaps)

Assumes Poisson counts P(no occurrence in time t) =

61


In our turn bay design example, what is the probability that no car will arrive in 1 minute? (19%)

How many 7 second gaps are expected in one minute??? 82% chance that any 7 sec. Period has no car … 60/7*82%=7/minute

62

Weibull Distribution

Very flexible empirical model

63

Sampling Distributions

64

Sampling Distributions

Some Definitions Some Common Sense Things An Example A Simulation Sampling Distributions Central Limit Theorem

65

Definitions

• Parameter: A number describing a population

• Statistic: A number describing a sample

• Random Sample: every unit in the population has an equal probability of being included in the sample

• Sampling Distribution: the probability distribution of a statistic

66

Common Sense Thing #1

A random sample should represent the population well, so sample statistics from a random sample should provide reasonable estimates of population parameters

67

All sample statistics have some error in estimating population parameters


68


If repeated samples are taken from a population and the same statistic (e.g. mean) is calculated from each sample, the statistics will vary, that is, they will have a distribution

69


A larger sample provides more information than a smaller sample so a statistic from a large sample should have less error than a statistic from a small sample

70

Distribution of when sampling from a normal distribution

has a normal distribution with mean = and standard deviation =

X x

nx

X

71

Central Limit Theorem

If the sample size (n) is large enough, has a normal distribution with

mean =

and

standard deviation =

regardless of the population distribution

X

x

nx

72

30n

73

Does have a normal distribution?X

Is the population normal?

is normal Is ?

may or may not beconsidered normal

X

is considered to benormal

X

30n

X

(We need more info)

Yes

Yes

No

No

74

Situation

Different samples produce different results. Value of a statistic, like mean or proportion,

depends on the particular sample obtained. But some values may be more likely than

others. The probability distribution of a statistic

(“sampling distribution”) indicates the likelihood of getting certain values.

75


Speed is normally distributed with mean 45 MPH and standard deviation 6 MPH.

Take random samples of n = 4. Then, sample means are normally distributed

with mean 45 MPH and standard error 3 MPH [from 6/sqrt(4) = 6/2].

76

Using empirical rule...

68% of samples of n=4 will have an average speed between 42 and 48 MPH.



77

What happens if we take larger samples?

Speed is normally distributed with mean 45 MPH and standard deviation 6 MPH.

Take random samples of n = 36 . Then, sample means are normally distributed

with mean 45 MPH and standard error 1 MPH [from 6/sqrt(36) = 6/6].

78

Again, using empirical rule...




So … the larger the sample, the less the sample averages vary.

79

Sampling Distributions for Proportions

Thought questions Basic rules ESP example Taste test example

80

Rule for Sample Proportions

81

Proportion “heads” in 50 tosses

Bell curve for possible proportions Curve centered at true proportion (0.50) SD of curve = Square root of [p(1-p)/n] SD = sqrt [0.5(1-0.5)/50] = 0.07 By empirical rule, 68% chance that a

proportion will be between 0.43 and 0.57

82

ESP example

Five cards are randomly shuffled A card is picked by the researcher Participant guesses which card This is repeated n = 80 times

83

Many people participate

Researcher tests hundreds of people Each person does n = 80 trials The proportion correct is calculated for each

person

84

Who has ESP?

What sample proportions go beyond luck? What proportions are within the normal

guessing range?

85

Possible results of ESP experiment

1 in 5 chance of correct guess If guessing, true p = 0.20 Typical guesser gets p = 0.20 SD of test = Sqrt [0.2(1-0.2)/80] = 0.035

86

Description of possible proportions

Bell curve Centered at 0.2 SD = 0.035 99% within 0.095 and 0.305 (+/- 3SD) If hundreds of tests, may find several (does it

mean they have ESP?)

87


88

Concepts of Confidence Intervals

89

Confidence Interval

A range of reasonable guesses at a population value, a mean for instance

Confidence level = chance that range of guesses captures the the population value

Most common confidence level is 95%

90

General Format of a Confidence Interval

estimate +/- margin of error

91

Transportation Example: Accuracy of a mean

A sample of n=36 has mean speed = 75.3. The SD = 8 . How well does this sample mean estimate

the population mean ?

92

Standard Error of Mean

SEM = SD of sample / square root of n SEM = 8 / square root ( 36) = 8 / 6 = 1.33 Margin of error of mean = 2 x SEM Margin of Error = 2.66 , about 2.7

93

Interpretation

95% chance the sample mean is within 2.7 MPH of the population mean. (q. what is implication on enforcement of type I error? Type II?)

A 95% confidence interval for the population mean

sample mean +/- margin of error 75.3 +/-2.7 ; 72.6 to 78.0

94

For Large Population

Could the mean speed be 72 MPH ? Maybe, but our interval doesn't include

72. It's likely that population mean is above

72.

95

C.I. for mean speed at another location

n=49 sample mean=70.3 MPH, SD = 8 SEM = 8 / square root(49) = 1.1 margin of error=2 x 1.1 = 2.2 Interval is 70.3 +/- 2.2 68.1 to 72.5

96

Do locations 1 and 2 differ in mean speed?

C.I. for location 1 is 72.6 to 78.0 C.I. for location 2 is 68.1 to 72.5 No overlap between intervals Looks safe to say that population means

differ

97

Thought Question

Study compares speed reduction due to enforcement vs. education

95% confidence intervals for mean speed reduction• Cop on side of road : 13.4 to 18.0• Speed monitor only : 6.4 to 11.2

98

Part A

Do you think this means that 95% of locations with cop present will lower speed between 13.4 and 18.0 MPH?

Answer : No. The interval is a range of guesses at the population mean.

This interval doesn't describe individual variation.

99

Part B

Can we conclude that there's a difference between mean speed reduction of the two programs ?

This is a reasonable conclusion. The two confidence intervals don't overlap.

It seems the population means are different.

100

Direct look at the difference

For cop present, mean speed reduction = 15.8 MPH

For sign only, mean speed reduction = 8.8 MPH

Difference = 7 MPH more reduction by enforcement method

101

Confidence Interval for Difference

95% confidence interval for difference in mean speed reduction is 3.5 to 10.5 MPH.• Don't worry about the calculations.

This interval is entirely above 0. This rules out "no difference" ; 0

difference would mean no difference.

102

Confidence Interval for a Mean

when you have a “small” sample...

103

As long as you have a “large” sample….

A confidence interval for a population mean is:

n

sZx

where the average, standard deviation, and n depend on the sample, and Z depends on the confidence level.

104


Random sample of 59 similar locations produces an average crash rate of 273.2. Sample standard deviation was 94.40.

09.2420.27359

4.9496.120.273

We can be 95% confident that the average crash rate was between 249.11 and 297.29

105

What happens if you can only take a “small” sample?

Random sample of 15 similar location crash rates had an average of 6.4 with standard deviation of 1.

What is the average crash rate at all similar locations?

106

If you have a “small” sample...

Replace the Z value with a t value to get:

n

stx

where “t” comes from Student’s t distribution, and depends on the sample size through the degrees of freedom “n-1”

Can also use the tau test for very small samples

107

Student’s t distribution versus Normal Z distribution

-5 0 5

0.0

0.1

0.2

0.3

0.4

Value

dens

ity

T-distribution and Standard Normal Z distribution

T with 5 d.f.

Z distribution

108

T distribution

Very similar to standard normal distribution, except:

t depends on the degrees of freedom “n-1” more likely to get extreme t values than

extreme Z values

111

Let’s compare t and Z values

Confidencelevel

t value with5 d.f

Z value

90% 2.015 1.65

95% 2.571 1.96

99% 4.032 2.58

For small samples, T value is larger than Z value. So,T interval is made to be longer than Z interval.

112

OK, enough theorizing!Let’s get back to our example!

Sample of 15 locations crash rate of 6.4 with standard deviation of 1.

55.04.615

1145.24.6

n

stx

Need t with n-1 = 15-1 = 14 d.f. For 95% confidence, t14 = 2.145

We can be 95% confident that average crash rate is between 5.85 and 6.95.

113

What happens as sample gets larger?

-5 0 5

0.0

0.1

0.2

0.3

0.4

Value

dens

ity

T-distribution and Standard Normal Z distribution

Z distribution

T with 60 d.f.

114

What happens to CI as sample gets larger?

n

sZx

n

stx

For large samples:

Z and t values become almost identical, so CIs are almost identical.

115

One not-so-small problem!

It is only OK to use the t interval for small samples if your original measurements are normally distributed.

We’ll learn how to check for normality in a minute.

116

Strategy for deciding how to analyze

If you have a large sample of, say, 60 or more measurements, then don’t worry about normality, and use the z-interval.

If you have a small sample and your data are normally distributed, then use the t-interval.

If you have a small sample and your data are not normally distributed, then do not use the t-interval, and stay tuned for what to do.

117

Hypothesis tests

Test should begin with a set of specific, testable hypotheses that can be tested using data:

– Not a meaningful hypothesis – Was safety improved by improvements to roadway

– Meaningful hypothesis – Were speeds reduced when traffic calming was introduced.

Usually to demonstrate evidence that there is a difference in measurable quantities

Hypothesis testing is a decision-making tool.

118

Hypothesis Step 1

Provide one working hypothesis – the null hypothesis – and an alternative

The null or nil hypothesis convention is generally that nothing happened

– Example speeds were not reduced after traffic calming – Null HypothesisSpeed were reduced after traffic calming – Alternative Hypothesis

When stating the hypothesis, the analyst must think of the impact of the potential error.

119

Step 2, select appropriate statistical test

The analyst may wish to test– Changes in the mean of events– Changes in the variation of events– Changes in the distribution of events

120

Step 3, Formulate decision rules and set levels for the probability of error

Accept Reject

Area where we incorrectly reject Type I error, referred to as significance level)

Area where we incorrectly acceptType II error, referred to as

121

Type I and II errors

lies in lies in acceptance interval rejection interval

Accept the No error Type II errorClaim

Reject the Type I error No errorclaim

122

Levels of and

Often is not considered in the development of the test.

There is a trade-off between and Over emphasis is placed on the level of significance

of the test. The level of should be appropriate for decision

being made.– Small values for decisions where errors cannot be tolerated

and errors are less likely– Larger values where type I errors can be more easily

tolerated

123

Step 4 Check statistical assumption

Draw new samples to check answerCheck the following assumption

– Are data continuous or discrete– Plot data– Inspect to make sure that data meets assumptions

For example, the normal distribution assumes that mean = median

– Inspect results for reasonableness

124

Step 5 Make decision

Typical misconceptions– Alpha is the most important error– Hypothesis tests are unconditional

It does not provide evidence that the working hypothesis is true

– Hypothesis test conclusion are correctAssume

– 300 independent tests– 100 rejection of work hypothesis = 0.05 and = 0.10– Thus 0.05 x 100 = 5 Type I errors– And 0.1 x 200 = 20 Type II errors– 25 time out of 300 the test results were wrong

125


The crash rates below, in 100 million vehicle miles, were calculated for 50, 20 mile long segment of interstate highway during 2002

.34 .22 .40 .25 .31 .34 .26 .55 .43 .34

.31 .43 .28 .33 .23 .40 .39 .38 .21 .43

.20 .20 .36 .48 .36 .30 .27 .42 .27 .28

.43 .45 .38 .54 .39 .55 .25 .35 .39 .43

.26 .17 .30 .40 .16 .32 .34 .46 .37 .33

0.1 - 0.2

0.2 - 0.3

0.3 - 0.4

0.4 - 0.5

0.5-0.6

Crash Rates

0

5

10

15

20

25

Frequency of Crash Rates by Sections

x = 0.35 s2= 0.0090 s = 0.095

126

Example continued

Crash rates are collected from non-interstate system highways built to slightly lower design standards. Similarly and average crash rate is calculated and it is greater (0.53). Also assume that both means have the same standard deviation (0.095). The question is do we arrive at the same accident rate with both facilities. Our hypothesis is that both have the same means f = nf Can we accept or reject our hypothesis?

127

Is this part of the crash rate distribution for interstate highways

Example continued

ccept Reject

Area where we incorrectly reject Type I error

Area where we incorrectly acceptType II error

128

Example continued

Lets set the probability of a Type I error at 5%– set (upper boundary - 0.35)/ 0.095 =1.645 (one tail

Z (cum) for 95%)– Upper boundary = 0.51– Therefore, we reject the hypothesis– What’s the probability of a Type II error?– (0.51 – 0.53)/0.095 = -0.21– 41.7%– There is a 41.7% chance of what?

129

The P value … example: Grade inflation

H0: μ = 2.7HA: μ > 2.7

Random sampleof students

X

n = 36

s = 0.6

and

Data

Decision RuleSet significance level α = 0.05. If p-value 0.05, reject null hypothesis.

130

Example: Grade inflation?Population of

5 million college students

Is the average GPA 2.7?

Sample of 100 college students

How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

131

The p-value illustrated

How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

132

Determining the p-value

H0: μ = average population GPA = 2.7HA: μ = average population GPA > 2.7

If 100 students have average GPA of 2.9 with standard deviation of 0.6, the P-value is:

0004.0]33.3[)]100/6.0/()7.29.2([)9.2(

ZPZPXP

133

Making the decision

The p-value is “small.” It is unlikely that we would get a sample as large as 2.9 if the average GPA of the population was 2.7.

Reject H0. There is sufficient evidence to conclude that the average GPA is greater than 2.7.

134

Terminology

H0: μ = 2.7 versus HA: μ > 2.7 is called a “right-tailed” or a “one-sided” hypothesis test, since the p-value is in the right tail.

Z = 3.33 is called the “test statistic”. If we think our p-value small if it is less than

0.05, then the probability that we make a Type I error is 0.05. This is called the “significance level” of the test. We say, α=0.05, where α is “alpha”.

135

Alternative Decision Rule

“Reject if p-value 0.05” is equivalent to “reject if the sample average, X-bar, is larger than 2.865”

X-bar > 2.865 is called “rejection region.”

136

Minimize chance of Type I error...

… by making significance level small. Common values are = 0.01, 0.05, or 0.10. “How small” depends on seriousness of Type

I error. Decision is not a statistical one but a

practical one (set alpha small for safety analysis, larger for traffic congestion, say)

137

Type II Error and Power

“Power” of a test is the probability of rejecting null when alternative is true.

“Power” = 1 - P(Type II error) To minimize the P(Type II error), we

equivalently want to maximize power. But power depends on the value under the

alternative hypothesis ...

138

Type II Error and Power

(Alternative is true)

139

Factors affecting power...

Difference between value under the null and the actual value

P(Type I error) = Standard deviation Sample size

140

Strategy for designing a good hypothesis test

Use pilot study to estimate std. deviation. Specify . Typically 0.01 to 0.10. Decide what a meaningful difference would

be between the mean in the null and the actual mean.

Decide power. Typically 0.80 to 0.99. Simple to use software to determine sample

size …

141

How to determine sample size

Depends on experiment

Basically, use the formulas and let sample size be the factor you want to determine

Vary the confidence interval, alpha and beta

http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html

142

If sample is too small ...

… the power can be too low to identify even large meaningful differences between the null and alternative values. – Determine sample size in advance of conducting

study.– Don’t believe the “fail-to-reject-results” of a study

based on a small sample.

143

If sample is really large ...

… the power can be extremely high for identifying even meaningless differences between the null and alternative values.– In addition to performing hypothesis tests, use a

confidence interval to estimate the actual population value.

– If a study reports a “reject result,” ask how much different?

144

The moral of the storyas researcher

Always determine how many measurements you need to take in order to have high enough power to achieve your study goals.

If you don’t know how to determine sample size, ask a statistical consultant to help you.

145

Important “Boohoo!” Point

Neither decision entails proving the null hypothesis or the alternative hypothesis.

We merely state there is enough evidence to behave one way or the other.

This is also always true in statistics! No matter what decision we make, there is always a chance we made an error.

Boohoo!

146

Comparing the Means of Two Dependent Populations

The Paired T-test ….

147

Assumptions: 2-Sample T-Test

Data in each group follow a normal distribution.

For pooled test, the variances for each group are equal.

The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).

148

What happens if samples aren’t independent?

That is, they are“dependent” or “correlated”?

149

Do signals with all-red clearance phases have lower numbers of crashes than those without?

All-Red No All-Red 60 32 32 44 80 22 50 40

Sample Average: 55.5 34.5

Real question is whether intersections with similar traffic volumes have different numbers of crashes. Better then to compare the difference in crashes in “pairs” of intersections with and without all-red clearance phases.

150

Now, a Paired Study

CrashesTraffic No all-red All-red DifferenceLow 22 20 2.0Medium 29 28 1.0Med-high 35 32 3.0High 80 78 2.0Averages 41.5 39.5 2.0St. Dev 26.1 26.1 0.816

P-value = How likely is it that a paired sample would have a difference as large as 2 if the true difference were 0? (Ho = no diff.) - Problem reduces to a One-Sample T-test on differences!!!!

151

The Paired-T Test Statistic

If:• there are n pairs• and the differences are normally distributed

nd

sd

μd

sdifferenceoferrorstandard

differenceedhypothesizdifferencesamplet

d

Then:The test statistic, which follows a t-distribution with n-1 degrees of freedom, gives us our p-value:

152

The Paired-T Confidence Interval

If:• there are n pairs• and the differences are normally distributed

nd

std

Then:The confidence interval, with t following t-distribution with n-1 d.f. estimates the actual population difference:

153

Data analyzed as 2-Sample T

Two sample T for No all-red vs All-red

N Mean StDev SE MeanNo 4 41.5 26.2 13All 4 39.5 26.1 13

95% CI for mu No - mu All: ( -43, 47)T-Test mu No = mu All (vs not =): T = 0.11

P = 0.92 DF = 6Both use Pooled StDev = 26.2

P = 0.92. Do not reject null. Insufficient evidence to conclude that there is a real difference.

154

Data analyzed as Paired T

Paired T for No all-red vs All-red

N Mean StDev SE MeanNo 4 41.5 26.2 13.1All 4 39.5 26.1 13.1Difference 4 2.000 0.816 0.408

95% CI for mean difference: (0.701, 3.299)T-Test of mean difference = 0 (vs not = 0): T-Value = 4.90 P-Value = 0.016

P = 0.016. Reject null. Sufficient evidence to conclude that there IS a difference.

155

What happened?

P-value from two-sample t-test is just plain wrong. (Assumptions not met.)

We removed or “blocked out” the extra variability in the data due to differences in traffic, thereby focusing directly on the differences in crashes.

The paired t-test is more “powerful” because the paired design reduces the variability in the data.

156

Ways Pairing Can Occur

When subjects in one group are “matched” with a similar subject in the second group.

When subjects serve as their own control by receiving both of two different treatments.

When, in “before and after” studies, the same subjects are measured twice.

157

If variances of the measurements of the two groups are not equal...

Estimate the standard error of the difference as:

2

2

1

1ns

ns 22

Then the sampling distribution is an approximate t distribution with a complicated formula for d.f.

158

If variances of the measurements of the two groups are equal...

Estimate the standard error of the difference using the common pooled variance:

21

n1

n1s2

p

Then the sampling distribution is a t distribution with n1+n2-2 degrees of freedom.

where 2nn1)s(n1)s(n

s21

2211p

222

Assume variances are equal only if neither sample standard deviation is more than twice that of the other sample standard deviation.

159

Assumptions for correct P-values

Data in each group follow a normal distribution.

If use pooled t-test, the variances for each group are equal.

The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).

160

Interpreting a confidence interval for the difference in two means…

If the confidenceinterval contains… then, we conclude …

zero the two means maynot differ

only positivenumbers

first mean is largerthan second mean

only negativenumbers

first mean is smallerthan second mean

161

Difference in variance

Use F distribution test Compute F = s1^2/s2^2 Largest sample variance on

top Look up in F table with n1

and n2 DOF Reject that the variance is

the same if f>F If used to test if a model is

the same (same coefficients) during 2 periods, it is called the Chow test

165

Experiments and pitfalls

Types of safety experiments– Before and after tests– Cross sectional tests

Control sampleModified sampleSimilar or the same condition must take place for both

samples

166

Regression to the mean

This problem plagues before and after tests– Before and after tests require less data and

therefore are more popular

Because safety improvements are driven abnormally high crash rates, crash rates are likely to go down whether or not an improvement is made.

167

San Francisco intersection crash data

No. ofIntersectionsWith Given

No. ofAccidents in

1974 - 76

Accidents/IntersectionIn 1974 - 76

Accident/Year/

Intersectionin 1974-76

Accidents/Year in

1976 – 76For Group(rounded)

Accidentsin 1977 for

Group

Accidents/Intersection

In 1977

% Change

256 0 0 0 64 0.25 LargeIncrease

218 1 0.33 72 120 0.55 67%173 2 0.67 116 121 0.7 Small

Increase121 3 1 121 126 1.04 Small

Increase97 4 1.33 129 105 1.08 -19%70 5 1.67 117 93 1.33 -20%54 6 2 108 84 1.56 -22%32 7 2.33 75 72 2.25 -3%29 8 2.67 77 47 1.62 -39%

168

Spill over and migration impacts

Improvements, particularly those that are expect to modify behavior should be expected to spillover impacts at other locations. For example, suppose that red light running cameras were installed at several locations throughout a city. Base on the video evidence, this jurisdiction has the ability to ticket violators and, therefore, less red light running is suspected throughout the system – leading to spill over impacts. A second data base is available for a similar control set of intersections that are believed not to be impacted by spill over. The result are listed below:

Crashes at Treated Sites

Crashes at Comparison

Sites (no spill over)

Before 100 140After 64 112

169

Spill over impact

Assuming that spill over has occurred at the treated sites, the reduction in accidents that would have occurred after naturally is 100 x 112/140 = 80 had the intersection remained untreated. therefore, the reduction is really from 80 crashes to 64 crashes (before vs after) or 20% rather than 100 crashes to 64 crashes or 36%.

170

Spurious correlations

During the 1980s and early 1990s the Japanese economy was growing at a much greater rate than the U.S. economy. A professor on loan to the federal reserved wrote a paper on the Japanese economy and correlated the growth in the rate of Japanese economy and their investment in transportation infrastructure and found a strong correlation. At the same time, the U.S. invests a much lower percentage of GDP in infrastructure and our GNP was growing at a much lower rate. The resulting conclusion was that if we wanted to grow the economy we would invest like the Japanese in public infrastructure. The Association of Road and Transportation Builders of America (ARTBA) loved his findings and at the 1992 annual meeting of the TRB the economist from Bates college professor won an award. In 1992 the Japanese economy when in the tank, the U.S. economy started its longest economic boom.

171

Spurious Correlation Cont.

What is going on here?

What is the nature of the relationship between transportation investment and economic growth?

1 tr 555 statistics “refresher” lecture 2: distributions and tests binomial, normal, log normal...

Documents