hypothesis testing · hypothesis testing example: i null hypothesis h 0: xn ˘n(0;˙) i alternative...

25
Hypothesis Testing Rianne de Heide CWI & Leiden University April 6, 2018

Upload: others

Post on 23-Jul-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Hypothesis Testing

Rianne de Heide

CWI & Leiden University

April 6, 2018

Page 2: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Me

PhD-studentMachine Learning Group

Peter Grunwald

Study coordinatorM.Sc. Statistical Science

Jacqueline Meulman

Page 3: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Projects

I Bayesian inference under model misspecification (SafeBayes)

I Foundations of Bayesianism

I Hypothesis testing (Frequentist, Bayesian, new methods)

Peter Tom Wouter Allard

Page 4: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist vs. Bayesian probability

Different views on probability:

I Frequentists: frequency in a long run

I Bayesians: degree of belief

Page 5: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Hypothesis Testing

Example:

I Null Hypothesis H0: X n ∼ N(0, σ)

I Alternative Hypothesis H1 : X n ∼ N(θ, σ), θ 6= 0

Different types of hypothesis tests

I p-value based null hypothesis significance tests (frequentist)

I Bayes Factor hypothesis tests (about probability of H0 giventhe data)Limitations: see De Heide and Grunwald (2018)

I Test martingales and S-tests (current work)

Page 6: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Hypothesis Testing

Example:

I Null Hypothesis H0: X n ∼ N(0, σ)

I Alternative Hypothesis H1 : X n ∼ N(θ, σ), θ 6= 0

Different types of hypothesis tests

I p-value based null hypothesis significance tests (frequentist)

I Bayes Factor hypothesis tests (about probability of H0 giventhe data)Limitations: see De Heide and Grunwald (2018)

I Test martingales and S-tests (current work)

Page 7: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Hypothesis Testing

Example:

I Null Hypothesis H0: X n ∼ N(0, σ)

I Alternative Hypothesis H1 : X n ∼ N(θ, σ), θ 6= 0

Different types of hypothesis tests

I p-value based null hypothesis significance tests (frequentist)

I Bayes Factor hypothesis tests (about probability of H0 giventhe data)Limitations: see De Heide and Grunwald (2018)

I Test martingales and S-tests (current work)

Page 8: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Hypothesis Testing

Example:

I Null Hypothesis H0: X n ∼ N(0, σ)

I Alternative Hypothesis H1 : X n ∼ N(θ, σ), θ 6= 0

Different types of hypothesis tests

I p-value based null hypothesis significance tests (frequentist)

I Bayes Factor hypothesis tests (about probability of H0 giventhe data)Limitations: see De Heide and Grunwald (2018)

I Test martingales and S-tests (current work)

Page 9: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Optional stopping

Optional stopping: ‘peeking at the results so far to decidewhether or not to gather more data’

→ reproducibility crisis in life and behavioural sciences (far toomany false positives)

Page 10: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (1)

I Data X n = X1,X2, . . . ,Xn; Xi ∈ XI Hypothesis test Tn : X n 7→ {0, 1}I Significance level α

Frequentist optional stoppingA sequence of hypothesis tests Tn : X n → {0, 1} with significancelevel α is said to be robust under frequentist optional stopping if

PH0 (∃n : Tn(X n) = 1) ≤ α.

Page 11: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (1)

I Data X n = X1,X2, . . . ,Xn; Xi ∈ XI Hypothesis test Tn : X n 7→ {0, 1}I Significance level α

Frequentist optional stoppingA sequence of hypothesis tests Tn : X n → {0, 1} with significancelevel α is said to be robust under frequentist optional stopping if

PH0 (∃n : Tn(X n) = 1) ≤ α.

Page 12: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)

H0: θ = 0H1: θ 6= 0Test statistic: Zn = X

√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 13: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)H0: θ = 0H1: θ 6= 0

Test statistic: Zn = X√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 14: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)H0: θ = 0H1: θ 6= 0Test statistic: Zn = X

√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 15: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)H0: θ = 0H1: θ 6= 0Test statistic: Zn = X

√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 16: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)H0: θ = 0H1: θ 6= 0Test statistic: Zn = X

√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 17: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)H0: θ = 0H1: θ 6= 0Test statistic: Zn = X

√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 18: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and p-values (2)

Example:

Data X n i.i.d.∼ N(θ, 1)H0: θ = 0H1: θ 6= 0Test statistic: Zn = X

√n

p-value: p = 2(1−Φ(|Zn|)

Stopping rule: Continue until |Zn| > k, then stop (withα = 0.05, k = 1.96).

LIL for X1,X2, . . . i.i.d. standard normal:

lim supn→∞

∑ni=1 xi√

2n log log n= 1 a.s.

This gives us: Zn > λ√

2 log log n i.o. w.p. 1, for λ < 1.

Page 19: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Frequentist Optional Stopping and Bayes Factors

Can Bayes Factor hypothesis tests handle frequentist optionalstopping? (Bayesian statisticians suggest they do...)

I Not always (see Sanborn and Hills (2014); De Heide andGrunwald (2018) )

I Yes, if H0 is simple

I Yes, with composite H0 and certain group invariance structure(see Hendriksen, De Heide and Grunwald (2018))Example: Bayesian t-test

Page 20: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Test Martingales (1)

The Test Martingale (Vovk et al., 2011):

A non-negative martingale {Mn}n∈N with initial value M0 = 1.

Page 21: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Test Martingales (1)

The Test Martingale (Vovk et al., 2011):

A non-negative martingale {Mn}n∈N with initial value M0 = 1.

Page 22: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Test Martingales (2)

Example:H0: the coin is fair, i.e. the outcomes X1,X2, . . . are i.i.d.Bernoulli(0.5)Start with capital K0 = 1At each round i

I We invest a fraction qi of our capital on the outcome{Xi = 0} and 1− qi on {Xi = 1}

I The outcome is revealed

I We get a pay-off: twice our stakes on the correct outcome,nothing on the other

After N rounds: KN = 2N∏N

t=1 q(XN |XN−1).

Page 23: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

Test Martingales (3)

I If H0 is true, the bet is fair, i.e. E[KN ] ≤ K0

I If KN is large, it is an indication that H0 is not true.

I Test martingales can handle frequentist optional stopping.

Page 24: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

S-test statistics

Test martingales can handle frequentist optional stopping and haveadditional advantages over p-values (interpretability)

Problem: how to construct test martingales for composite H0.

Current work: S-value

Page 25: Hypothesis Testing · Hypothesis Testing Example: I Null Hypothesis H 0: Xn ˘N(0;˙) I Alternative Hypothesis H 1: Xn ˘N( ;˙); 6= 0 Di erent types of hypothesis tests I p-value

References

P. Grunwald, R. de Heide, and W. Koolen. Safe tests. work inprogress, 2018.

R. de Heide and P. Grunwald. Why optional stopping is a problemfor Bayesians. arXiv preprint arXiv:1708.08278, 2018.

A. Hendriksen, R. de Heide, and P. Grunwald. Folklore results onoptional stopping of Bayesian methods made precise. work inprogress, 2018.

A. Sanborn and T. Hills. The frequentist implications of optionalstopping on Bayesian hypothesis tests. Psychonomic Bulletin &Review, 21(2):283–300, 2014.

V. Vovk, N. Vereshchagin, A. Shen, and G. Shafer. Testmartingales, Bayes factors and p-values. Statistical Science, 26(1):84–101, 2011.