median: non-parametric testspersonal.vu.nl/r.heijungs/qm/1718/bs/slides/slides/... · wilcoxon...

Post on 25-Apr-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MEDIAN: NON-PARAMETRIC TESTS

Business Statistics

Hypotheses on the median

The sign test

The Wilcoxon signed ranks test

Old exam question

Further study

CONTENTS

▪ The median is a central value that may be more suitable for

strongly asymmetric distributions▪ and for distributions with fat tails

▪ Can we test a population median?▪ e.g., 𝐻0:𝑀 = 400

▪ Note:▪ for a more or less symmetric distribution, 𝑀 ≈ 𝜇, so a 𝑡-test of

mean is appropriate (if 𝑛 ≥ 15)

▪ although perhaps more sensitive to large positive or negative

outliers in the sample

HYPOTHESES ON THE MEDIAN

𝑀 is here the population median. Think of it as a

Greek letter ...

▪ What is the median of a sample?▪ it is the middle value, i.e. 𝑥 𝑛/2

▪ So, if 𝐻0:𝑀 = 400 would be true, approximately half of

the data in the sample would be lower, and half would be

higher

▪ Therefore, if we count the number of data points that is

lower and compare it to the number of observations, we

can develop a test statistic

▪ Two varieties of such non-parametric tests today:▪ sign test

▪ Wilcoxon signed rank test

HYPOTHESES ON THE MEDIAN

The sign test

▪ involves simply counting the number of positive or negative

signs in a sequence of 𝑛 signs

▪ is based on the binomial distribution

▪ can be applied without requirements on the population

distribution

THE SIGN TEST

Computational steps:

▪ for each data point 𝑥𝑖 compute the difference with the

median (𝑀) of the null hypothesis (𝐻0): 𝑑𝑖 = 𝑥𝑖 −𝑀▪ omit zero differences (𝑑𝑖 = 0); effective sample size is 𝑛′

▪ assign +1 to positive differences (𝑑𝑖 > 0) and −1 to

negative differences (𝑑𝑖 < 0)

▪ test statistic 𝑋 is the sum of the positive numbers (=

number of positive observations)

THE SIGN TEST

Example:

Context: battery life until failure (in hours)

▪ 𝐻0:𝑀 = 400; 𝐻1:𝑀 ≠ 400▪ use 𝛼 = 0.05▪ sample of 𝑛 = 13 observations (𝑥1, … , 𝑥13)

▪ reject for large and for small numbers of positive signs

THE SIGN TEST

Example (𝐻0: 𝑀 = 400):

▪ data: 𝑥𝑖 (𝑖 = 1,… , 13)

▪ difference with 𝑀: 𝑑𝑖 = 𝑥𝑖 − 400▪ no cases where 𝑑𝑖 = 0, so 𝑛′ = 𝑛

▪ 𝑠𝑖 = ቊ1 if 𝑑𝑖 > 0−1 if 𝑑𝑖 < 0

▪ 𝑠𝑖+

= ቊ1 if 𝑑𝑖 > 00 if 𝑑𝑖 < 0

▪ 𝑥 = σ𝑖=1𝑛′ 𝑠𝑖

+= 8

THE SIGN TEST

xi xi-400 si si(+)

342 -58 -1

426 26 1 1

317 -83 -1

545 145 1 1

264 -136 -1

451 51 1 1

1049 649 1 1

631 231 1 1

512 112 1 1

266 -134 -1

492 92 1 1

562 162 1 1

298 -102 -1

Example (continued):

▪ 𝑥 = 8▪ under 𝐻0: 𝑋~𝑏𝑖𝑛 13,0.5▪ 𝑃𝑏𝑖𝑛 13,0.5 𝑋 ≥ 8 = 0.291

▪ why ≥ 8?

▪ if we would reject for 8, we would also reject for 9

▪ 𝑝-value: 2 × 0.291 = 0.581▪ why 2 ×?

▪ because it’s a two-sided null hypothesis

▪ there is no reason to reject 𝐻0

THE SIGN TEST

Suppose we have more observations (𝑛 = 130) and find

𝑥 = 80. Can you look up 𝑃𝑏𝑖𝑛 130,0.5 𝑋 ≥ 80 ?

EXERCISE 1

In the sign test, we replace the numerical values by signs (+ or −)Advantage:▪ we don’t need any assumption on normality, symmetry, etc.

▪ that’s why we say it’s non-parametric: we don’t have to assume a certain distribution with parameters

Disadvantage:▪ we discard much information, so that the test is not very

sensitive (has low “power”; see later)Are there other non-parametric tests that are more powerful?▪ is there a compromise between value and sign that still needs

some assumptions, but not too many assumptions?Yes, replacing data by their rank

THE SIGN TEST

Wilcoxon signed rank test▪ involves comparing the sum of ranks of the values larger

than the test value with the sum of ranks of the values smaller than the test value

Computational Steps:▪ for each data point 𝑥𝑖 compute the absolute difference with

the median (𝑀) of the null hypothesis: 𝑑𝑖 = 𝑥𝑖 −𝑀▪ omit zero differences (𝑑𝑖 = 0); effective sample size is 𝑛′

▪ assign ranks (1,… , 𝑛′) to the 𝑑𝑖▪ reassign + and − to the ranks▪ test statistic (𝑊) is the sum of the positive ranks

THE WILCOXON SIGNED RANK TEST

Example (𝐻0: 𝑀 = 400):

▪ data: 𝑥𝑖 (𝑖 = 1,… , 13)

▪ difference with

𝑀: 𝑑𝑖 = 𝑥𝑖 − 400▪ no cases where 𝑑𝑖 = 0,

so 𝑛′ = 𝑛

▪ 𝑤 = σ𝑖=1𝑛′ 𝑟𝑖

+= 61

▪ under 𝐻0:𝑊~? (use table)

▪ 𝑃𝐻0 𝑊 ≥ 61 =?

THE WILCOXON SIGNED RANK TEST

xi xi–

400

|xi–400| ri ri(+)

342 -58 58 -3

426 26 26 1 1

317 -83 83 -4

545 145 145 10 10

264 -136 136 -9

451 51 51 2 2

1049 649 649 13 13

631 231 231 12 12

512 112 112 7 7

266 -134 134 -8

492 92 92 5 5

562 162 162 11 11

298 -102 102 -6

Testing the median using the Wilcoxon 𝑊 statistic

▪ small samples: using a table of critical values▪ included in tables at exam

▪ large samples: using a normal approximation of 𝑊▪ valid when 𝑛 ≥ 20

▪ The test is only valid for symmetrically distributed

populations▪ if not, use sign test

THE WILCOXON SIGNED RANK TEST

Small samples: critical values of Wilcoxon statistic

▪ two-sided, 𝛼 = 0.05, 𝑛 = 13: 𝑤𝑙𝑜𝑤𝑒𝑟 = 17 and 𝑤𝑢𝑝𝑝𝑒𝑟 = 74▪ 𝑅crit = [0,17] ∪ [74,91]▪ 𝑤calc = 61, so do not reject 𝐻0 at 𝛼 = 0.05

THE WILCOXON SIGNED RANK TEST

a = 0.05 a = 0.025 a = 0.01 a = 0.005

a = 0.10 a = 0.05 a = 0.02 a = 0.01

n

5 0 , 15 --- , --- --- , --- --- , ---

6 2 , 19 0 , 21 --- , --- --- , ---

7 3 , 25 2 , 26 0 , 28 --- , ---

8 5 , 31 3 , 33 1 , 35 0 , 36

9 8 , 37 5 , 40 3 , 42 1 , 44

10 10 , 45 8 , 47 5 , 50 3 , 52

11 13 , 53 10 , 56 7 , 59 5 , 61

12 17 , 61 13 , 65 10 , 68 7 , 71

13 21 , 70 17 , 74 12 , 79 10 , 81

two-tail:

(lower , upper)

Lower and Upper Critical Values W of Wilcoxon Signed-Ranks Test

one-tail:

Table is available at the exam (and on

the course website)

Large samples: under 𝐻0:, it can be shown that

▪ 𝐸 𝑊 =𝑛 𝑛+1

4

▪ var 𝑊 =𝑛 𝑛+1 2𝑛+1

24

Further, for 𝑛 ≥ 20, approximately:

𝑊−𝑛 𝑛+1

4

𝑛 𝑛+1 2𝑛+1

24

~𝑁 0,1

▪ so you can compute 𝑧calc =𝑤calc−

𝑛 𝑛+1

4

𝑛 𝑛+1 2𝑛+1

24

▪ and compare it to 𝑧crit (e.g., ±1.96)

THE WILCOXON SIGNED RANK TEST

Example, continued:

▪ 𝑤 = σ𝑖=1𝑛′ 𝑟𝑖

+= 61

▪ under 𝐻0:𝑊~𝑁 𝐸 𝑊 , var 𝑊

▪ so, under 𝐻0:𝑊−𝐸 𝑊

var 𝑊~𝑁 0,1

▪ 𝑃𝑁 𝑊 ≥ 61 = 𝑃𝑊−𝐸 𝑊

var 𝑊≥

61−45.5

14.31= 𝑃ሺ

𝑍 ≥

1.08 = 0.1401▪ 𝑝-value: 2 × 0.1401 = 0.2802▪ there is no reason to reject 𝐻0

THE WILCOXON SIGNED RANK TEST

In fact, not a good idea because 𝑛 = 13 ≱ 20. We do it just to show how it works ...

23 March 2015, Q1l-m

OLD EXAM QUESTION

Doane & Seward 5/E 16.1-16.3

Tutorial exercises week 3

Wilcoxon signed rank test, sign test

FURTHER STUDY

top related