1 advances in statistics or, what you might find if you picked up a current issue of a biological...

1

Advances in Statistics

Or, what you might find if you picked up a current issue of a Biological Journal

2

Advances in Statistics

• Extensions to the ANOVA• Computer-intensive methods• Maximum likelihood

3

Extensions to ANOVA

• One-way ANOVA– This works for a single explanatory variable

– Simplest possible design

• Two-way ANOVA– Two categorical explanatory variables

– Factorial design

4

ANOVA Tables

Source of variation

Sum of squares

df Mean Squares

F ratio P

Treatment

k-1

Error N-k

Total N-1€

SSerror = si2(ni −1)∑€

SSgroup = ni(Y i −Y )2∑

€

SSgroup + SSerror €

MSerror =SSerror

dferror€

MSgroup =SSgroup

dfgroup

€

F =MSgroup

MSerror

*

5

Two-factor ANOVA TableSource of variation

Sum of Squares

df Mean Square

F ratio P

Treatment 1

SS1 k1 - 1 SS1

k1 - 1

MS1

MSE

Treatment 2

SS2 k2 - 1 SS2

k2 - 1

MS2

MSE

Treatment 1 * Treatment 2

SS1*2 (k1 - 1)*(k2 - 1)

SS1*2

(k1 - 1)*(k2 - 1)

MS1*2

MSE

Error SSerror XXX SSerror

XXX

Total SStotal N-1

6

Two-factor ANOVA TableSource of variation

Sum of Squares

df Mean Square

F ratio P

Treatment 1

SS1 k1 - 1 SS1

k1 - 1

MS1

MSE

Treatment 2

SS2 k2 - 1 SS2

k2 - 1

MS2

MSE

Treatment 1 * Treatment 2

SS1*2 (k1 - 1)*(k2 - 1)

SS1*2

(k1 - 1)*(k2 - 1)

MS1*2

MSE

Error SSerror XXX SSerror

XXX

Total SStotal N-1

Two categorical explanatory variables

7

General Linear Models

• Used to analyze variation in Y when there is more than one explanatory variable

• Explanatory variables can be categorical or numerical

8


• First step: formulate a model statement

• Example:

€

Y = μ + TREATMENT

9


• First step: formulate a model statement

• Example:

€

Y = μ + TREATMENT

Overallmean

Treatment effect

10


• Second step: Make an ANOVA table

• Example: Source of variation

Sum of squares

df Mean Squares

F ratio P

Treatment

k-1

Error N-k

Total N-1

€



€

SSgroup + SSerror

€

MSerror =SSerror

dferror€

MSgroup =SSgroup

dfgroup

€

F =MSgroup

MSerror*

11


• Second step: Make an ANOVA table

• Example: Source of variation

Sum of squares

df Mean Squares

F ratio P

Treatment

k-1

Error N-k

Total N-1

€



€

SSgroup + SSerror

€

MSerror =SSerror

dferror€

MSgroup =SSgroup

dfgroup

€

F =MSgroup

MSerror*

This is the same as a one-way ANOVA!

12


• If there is only one explanatory variable, these are exactly equivalent to things we’ve already done– One categorical variable: ANOVA– One numerical variable: regression

• Great for more complicated situations

13

Example 1: Experiment with blocking

• Fish experiment: sensitivity of goldfish to light

• Fish are randomly selected from the population

• Four different light treatments are applied to each fish

14

Randomized Block Design

Blocks (fish)

Treatments(light wavelengths)

15

Randomized Block Design

16

Step 1: Make a model statement

€

Y = μ + BLOCK + TREATMENT

17

Step 2: Make an ANOVA table

18

Another Example: Mole Rats

• Are there lazy mole rats?• Two variables:

– Worker type: categorical•“frequent workers” and “infrequent workers”

– Body mass (ln-transformed): numerical

20


€

Y = μ + CASTE + LNMASS + CASTE * LNMASS

21


22


23


€

Y = μ + CASTE + LNMASS

24


25


Also called ANCOVA-

Analysis of Covariance

26


• Can handle any number of predictor variables

• Each can be categorical or numerical

• Tables have the same basic structure

• Same assumptions as ANOVA

27


• Don’t run out of degrees of freedom!

• Sometimes, the F-statistics will have DIFFERENT denominators - see book for an example

28

Computer-intensive methods

• Hypothesis testing:– Simulation– Randomization

• Confidence intervals– Bootstrap

29

Simulation

• Simulates the sampling process on a computer many times: generates the null distribution from estimates done on the simulated data

• Computer assumes the null hypothesis is true

30

Example: Social spider sex ratios

Social spiders live in groups

31

Example: Social spider sex ratios

• Groups are mostly females• Hypothesis: Groups have just enough males to allow reproduction

• Test: Whether distribution of number of males is as predicted by chance

• Problem: Groups are of many different sizes

• Binomial distribution therefore doesn’t apply

32

Simulation:

• For each group, the number of spiders is known. The overall proportion of males, pm, is known.

• For each group, the computer draws the real number of spiders, and each has pm probability of being male.

• This is done for all groups, and the variance in proportion of males is calculated.

• This is repeated a large number of times.

33

0.5 1. 1.5 2. 2.5 3. 3.5

200

400

600

800

1000

Variance in proportion of males

(Pseudo-values)

Frequency

Actual Observed Value (0.44)

The observed value (0.44), or something more extreme,is observed in only 4.9% of the simulations. Therefore P = 0.049.

34

Randomization

• Used for hypothesis testing• Mixes the real data randomly• Variable 1 from an individual is paired with variable 2 data from a randomly chosen individual. This is done for all individuals.

• The estimate is made on the randomized data.

• The whole process is repeated numerous times. The distribution of the randomized estimates is the null distribution.

35

Without replacement

• Randomization is done without replacement.

• In other words, all data points are used exactly once in each randomized data set.

36

Randomization can be done for any test of association between

two variables

37

Example: Sage crickets

Sage cricket males sometimesoffer their hind-wings to females to eat during mating.

Do females who eat hind-wingswait longer to re-mate?

38

Table 12.3A Waiting time to remating in sage cricket females afterinitial mating with either a wingless or winged male (presented inln(days))

Male wingless Male winged

0 1.4

0.7 1.6

0.7 1.9

1.4 2.3

1.6 2.6

1.8 2.8

1.9 2.8

1.9 2.8

1.9 3.1

2.2 3.8

2.1 3.9

2.1 4.5

4.7

39

ln(Time to remating): First mate had no wings

ln(Time to remating): First mate had intact wings

Problems:Unequal variance, non-normal distributions

40

Male wingless

Male winged

0 1.4

0.7 1.6

0.7 1.9

1.4 2.3

1.6 2.6

1.8 2.8

1.9 2.8

1.9 2.8

1.9 3.1

2.2 3.8

2.1 3.9

2.1 4.5

4.7

Real data: Randomized data:

€

Y 1 −Y 2 = −1.41

Male wingless

Male winged

0.7 2.8

2.3 1.9

1.9 2.1

1.8 1.6

3.8 0

1.4 1.4

1.9 2.2

3.9 2.1

4.7 1.6

2.6 4.5

1.9 2.8

2.8 0.7

3.1

€

Y 1 −Y 2 = 0.41

41

Note that each data point was only used

once

42

1000 randomizations

P < 0.001

43

Randomization: Other questions

Q: Is this periodic?

(yes)

44

Bootstrap

• Method for estimation (and confidence intervals)

• Often used for hypothesis testing too

• "Picking yourself up by your own bootstraps"

45

Bootstrap

• For each group, randomly pick with replacement an equal number of data points, from the data of that group

• With this bootstrap dataset, calculate the estimate -- bootstrap replicate estimate

46

Male wingless

Male winged

0 1.4

0.7 1.6

0.7 1.9

1.4 2.3

1.6 2.6

1.8 2.8

1.9 2.8

1.9 2.8

1.9 3.1

2.2 3.8

2.1 3.9

2.1 4.5

4.7

Real data: Bootstrap data:

€

Y 1 −Y 2 = −1.41

Male wingless

Male winged

0.7 1.4

0.7 1.4

1.4 2.8

1.4 2.8

1.8 2.8

1.8 3.1

1.8 3.1

1.9 3.9

1.9 4.5

2.1 4.7

2.1 4.7

2.1 4.7

4.7

€

Y 1 −Y 2 = −1.78

48

Bootstraps are often used in evolutionary

trees

49

Likelihood

€

L hypothesis A | data( ) = P data | hypothesis A[ ]

Likelihood considers many possible hypotheses, not just one

50

Law of likelihood

A particular data set supports one hypothesis better than another if the likelihood of that hypothesis is higher than the likelihood of the other hypothesis.

Therefore we try to find the hypothesis with the maximum likelihood.

51

All estimates we have learned so far are

also maximum likelihood estimates.

52

"Simple" example

• Using likelihood to estimate a proportion

• Data: 3 out of 8 individuals are male.

• Question: What is the maximum likelihood estimate of the proportion of males?

53

Likelihood

€

L p = x( ) = P 3 males out of 8 | p = x[ ]

where x is a hypothesized value of the proportion of males.

e.g., L(p=0.5) is the likelihood of the hypothesis that the proportion of males is 0.5.

54

For this example only...

The probability of getting 3 males out of 8 independent trials is given by the binomial distribution.

€

L p = x( ) = Pr data | p = x[ ]

= Pr 3out of 8 | p = x[ ]

=8

3

⎛

⎝ ⎜

⎞

⎠ ⎟x

3 1− x( )8−3

55

How to find maximum likelihood hypothesis

1. Calculus

or

2. Computer calculations

56

By calculus...

• Maximum value of L(p=x) is found when x = 3/8.

• Note that this is the same value we would have gotten by methods we already learned.

57

By computer calculation...

0.2 0.4 0.6 0.8 1

0.001

0.002

0.003

0.004

0.005

x

L(p=x)

x = 3/8

Input likelihood formula to computer, plot the value of L for each value of x, and find the largest L.

58

Finding genes for corn yield:

Corn Chromosome 5

59

Hypothesis testing by likelihood

• Compares the likelihood of maximum likelihood estimate to a null hypothesis

Log-likelihood ratio =

€

lnLikelihood[Maximum likelihood hypothesis]

Likelihood[Null hypothesis]

⎡

⎣ ⎢

⎤

⎦ ⎥

60

Test statistic

€

χ 2 = 2 log likelihood ratio

With df equal to the number of variables fixed to make null hypothesis

61

Example:3 males out of 8 individuals

• H0: 50% are male

• Maximum likelihood estimate

€

ˆ p =3

8

€

L p = 3/8[ ] =8

3

⎛

⎝ ⎜

⎞

⎠ ⎟ 3/8( )

31− 3/8( )

5= 0.2816

62

Likelihood of null hypothesis

€

L p = 0.5[ ] =8

3

⎛

⎝ ⎜

⎞

⎠ ⎟ 0.5( )

31− 0.5( )

5= 0.21875

63

Log likelihood ratio

€

lnL p = 3/8[ ]L p = 0.5[ ]

⎡

⎣ ⎢

⎤

⎦ ⎥= ln

0.2816

0.21875

⎡ ⎣ ⎢

⎤ ⎦ ⎥= 0.2526

χ 2 = 2 0.2526( ) = 0.5051

We fixed one variable in the null hypothesis (p),So the test has df = 1.

€

χ0.05,12 = 3.84, so we do not reject H0.

1 advances in statistics or, what you might find if you picked up a current issue of a biological...

Documents

fish slide

numerical slide

covariance slide

anova oneway anova

anova table example

mean treatment effect

biological journal slide

mse treatment