approximate random distribution of coefficients of correlation for two random variates = 0.03 under...

15
0 0.05 0.1 0.15 0.2 0.25 0.3 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Frequency Correlation Approximate random distribution of coefficients of correlation for two random variates g= 0.03 Under a normal approximation we can use Z-transformed score for statistical infering. Exp StdDev Exp Obs Z P(m - s < X < m + s) = 68% P(m - 1.65s < X < m + 1.65s) = 90% P(m - 1.96s < X < m + 1.96s) = 95% P(m - 2.58s < X < m + 2.58s) = 99% P(m - 3.29s < X < m + 3.29s) = 99.9% The Fisherian significance levels The standard normal distribution Z is standard normally distributed 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 3 6 9 12 15 18 X f(x n=20 0 0.02 0.04 0.06 0.08 0.1 0.12 0 6 12 18 24 30 36 42 48 X f(x n=50 0 0.05 0.1 0.15 0.2 0.25 0.3 0 2 4 6 8 10 X f(x n=10 0 0.01 0.02 0.03 0.04 0.05 0.06 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 X f(x + s - s 0.68 +2 s -2 s 0.95 Lecture 2 Randomization techniques

Upload: mandy-crissey

Post on 14-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

0

0.05

0.1

0.15

0.2

0.25

0.3

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Freq

uenc

y

Correlation

Approximate random distribution of coefficients of correlation for two random variates

g= 0.03

Under a normal approximation we can use Z-transformed score for

statistical infering.

ExpStdDevExpObs

Z

P(m - s < X < m + s) = 68%P(m - 1.65s < X < m + 1.65s) =

90%P(m - 1.96s < X < m + 1.96s) =

95%P(m - 2.58s < X < m + 2.58s) =

99% P(m - 3.29s < X < m + 3.29s) =

99.9%

The Fisherian significance levels

The standard normal distribution

Z is standard normally distributed

00.020.040.060.08

0.10.120.140.160.18

0.2

0 3 6 9 12 15 18X

f(x)

n=20

0

0.02

0.04

0.06

0.08

0.1

0.12

0 6 12 18 24 30 36 42 48X

f(x)

n=50

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10X

f(x)

n=10

0

0.01

0.02

0.03

0.04

0.05

0.06

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X

f(x)

2

2

( )

21( )

2

x

f x e

+- 0.68

+2-2 0.95

Lecture 2Randomization techniques

Page 2: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Country sq.km DeltaTAlbania 28748 17Andorra 468 15Austria 83871 20Azores 2200 7Baleary Islands 5014 15Belarus 207650 23Belgium 30528 15Bosnia and Herzegovina 51197 20Bulgaria 110971 21Canary Islands 7270 5Channel Is. 300 10Corsica 8680 13Crete 8259 13Croatia 56594 21Cyclades Is. 2500 12Cyprus 9250 19Czech Republic 78866 19Denmark 43093 16Dodecanese Is. 2663 14Estonia 45227 21Faroe Is. 1399 7Finland 338145 23France 543965 15Franz Josef Land 16134 27Germany 357021 19Gibraltar 6.5 10Greece 131992 17Hungary 93054 22Iceland 103000 12Ireland 70273 10Italy 301401 16Kaliningrad Region 15000 19Latvia 64626 20Liechtenstein 160 14Lithuania 65318 22Luxembourg 2588 16Macedonia 25339 23Madeira(Funchal) 789 5Malta 316 14Moldova 33709 23Monaco 1.95 12… … …

Average temperature difference in European countries/islands

Permutation test probability

Bootstrap probability

Probability level

Parameters and standard errors

Consider the coefficient of correlation. Statistical significance of r > 0 (H1) is tested against the null hypothesis H0 of r = 0. Most statistics programs do this using Fisher’s Z-

transformation

1 1 rZ ln

2 1 r

Reshuffling

Page 3: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Permutation testing

Random number ln area ln Delta T r Sim r Average r Average r0.247012838 11.33704 2.833213 0.457176 0.14894 0.08609641 =+ŚREDNIA(H2:H21)0.303300878 12.65321 2.70805 0.014534 StdDev r StdDev r0.725633833 9.917045 2.995732 0.157997 0.16530152 +ODCH.STANDARDOWE(H2:H21)0.258217857 0.667829 1.94591 0.0310330.632451857 7.243513 2.70805 -0.14119 t t0.254528292 7.696213 3.135494 0.268839 10.0393331 (H2-J2)/J4*20^0.50.980671601 13.01692 2.70805 0.117112 P(t) P(t)0.522396276 10.62825 2.995732 0.137361 4.9403E-09 +ROZKŁAD.T(J7,19,2)0.683545674 11.08702 3.044522 0.214470.773648713 7.887209 1.609438 0.159525 Z Z0.359562515 10.3264 2.302585 -0.05251 2.24486312 +(G2-H2)/J40.128137778 12.68838 2.564949 -0.23382 P(Z) P(Z)0.573061911 11.7905 2.564949 0.072888 0.03687629 =ROZKŁAD.T(J12,19,2)0.025421522 12.78555 3.044522 -0.046160.087309492 11.42796 2.484907 0.2224670.20159921 9.132379 2.944439 -0.14329

0.438208554 12.40519 2.944439 -0.05720.575893524 13.13427 2.772589 0.4491860.931176694 10.1401 2.639057 0.1675530.0309793 10.67112 3.044522 0.234201

0.032472788 10.63432 1.945910.352239001 9.019059 3.135494

We reorder one of the variables at random (at least

1000 times)

We calculate the mean, standard deviation, and the upper and lower confidence intervals.This gives us an estimate of how probable is the observed correlation.

Page 4: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

0

0.05

0.1

0.15

0.2

0.25

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Freq

uenc

y

Correlation

The distribution of randomized correlation coefficients

Observed value

The distribution is not symmetric.We can’t use Z-transformed values (the normal approximation)We can’t use a t-test.

Lower two-sided 1% confidence

limit

Upper two-sided 1% confidence

limit

We have to use the upper and lower probability levels. We get them directly from the random distribution

Probability level for r = 0.457: P = 0.0006

Page 5: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Jackknifing

Time Blood pressure

1 115

2 117

3 124

4 121

5 122

6 119

7 120

8 126

9 117

10 122

11 121

12 127

13 129

14 122

15 122

16 129

17 121

18 111

19 113

20 114

115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115

117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117

124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124

121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121

122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122

119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119

120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120

126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126

117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117

122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122

121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121

127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127

129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129

122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122

122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122

129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129

121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121

111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111

113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113

114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114

Blood pressure

Mean 121

Stddev 4.97

CV 0.04

121 121 121 121 121 121 121 121 121 121 121 121 120 121 121 120 121 121 121 121

4.923 5.039 5.049 5.108 5.087 5.093 5.103 4.954 5.052 5.097 5.102 4.870 4.721 5.094 5.101 4.712 5.105 4.544 4.742 4.876

0.041 0.042 0.042 0.042 0.042 0.042 0.042 0.041 0.042 0.042 0.042 0.040 0.039 0.042 0.042 0.039 0.042 0.037 0.039 0.040

0.03 0.05 0.05 0.06 0.06 0.06 0.06 0.04 0.05 0.06 0.06 0.03 0.00 0.06 0.06 0.00 0.06 -0.03 0.00 0.03

0.04

0.000 0.003 0.003 0.004 0.004 0.004 0.004 0.002 0.003 0.004 0.004 0.001 0.000 0.004 0.004 0.000 0.004 0.001 0.000 0.001

0.046

0.01

Pseudovalues

Mean

Squared differences

Sum

Standard error

( 1)( )i ip X n X X

2( )

( 1)

ip p

SEn n

The jackknifed standard error of the coefficient of variation

Page 6: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Population

Sample

1000 bootstrap samples

1000 bootstrapparameter estimates

Bootstrap distribution

Distribution parametersas estimates of

population distribution

Bootstrapping

Take the original values and calculate the parameter you need

Take 1000 random samples of different size

Calculate 1000 parameters from the bootstrap samples

Compare the observed value with the parameters distribution and calulate the confidence limits for the observed value

Page 7: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Time Blood pressure

1 115

2 117

3 124

4 121

5 122

6 119

7 120

8 126

9 117

10 122

11 121

12 127

13 129

14 122

15 122

16 129

17 121

18 111

19 113

20 114

Mean 121

Stddev 4.97

CV 0.041

Mean

Standard deviation

115 115 115 115 115 115 115 115 115 115 115 115

117 117 117 117 117 117 117 117 117

124 124 124 124 124 124 124 124 124 124

121 121 121 121 121 121 121 121 121 121 121 121 121 121 121

122 122 122 122 122 122 122 122 122

119 119 119 119 119 119 119 119 119 119 119

120 120 120 120 120 120 120 120 120 120 120 120 120 120

126 126 126 126 126 126 126 126 126 126

117 117 117 117 117 117 117 117 117

122 122 122 122 122 122 122 122 122 122 122 122 122 122

121 121 121 121 121 121 121 121 121

127 127 127 127 127 127 127 127 127 127 127 127 127 127 127

129 129 129 129 129 129 129 129 129 129

122 122 122 122 122 122 122 122 122 122 122 122 122

122 122 122 122 122 122 122 122 122 122

129 129 129 129 129 129 129 129 129 129 129 129 129

121 121 121 121 121 121 121 121 121 121

111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111

113 113 113 113 113 113 113

114 114 114 114 114 114 114 114 114

Blood pressure

120 120 120 122 121 122 121 121 120 122 121 121 122 120 120 120 121 122 120 119

4.140 4.617 5.164 5.066 5.593 5.288 5.397 5.166 5.952 5.528 5.585 4.683 3.874 5.045 4.229 5.028 5.653 4.291 4.657 5.690

0.034 0.038 0.043 0.042 0.046 0.043 0.045 0.043 0.049 0.045 0.046 0.039 0.032 0.042 0.035 0.042 0.047 0.035 0.039 0.048

0.042

0.005

We use at least 1000 random samples and calculate for each sample CV. The standard deviation of thses CV values is an estimate of the standard error of the original CV.

The standard error of a distribution is identical to the standard deviation of the sample.

Page 8: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

0

0.05

0.1

0.15

0.2

0.25

0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5

Freq

uenc

y

CV

Bootstrap distribution

i

ii

x xb n

s

Mean 121 120 120 120 122 121 122 121 121 120 122 121

Stddev 4.97 4.140 4.617 5.164 5.066 5.593 5.288 5.397 5.166 5.952 5.528 5.585

CV 0.041 0.034 0.038 0.043 0.042 0.046 0.043 0.045 0.043 0.049 0.045 0.046

Mean 0.042

Standard deviation 0.005

N 11.00 13.00 17.00 13.00 11.00 12.00 15.00 11.00 11.00 13.00 11.00

Studentized values -0.007 -0.003 0.001 0.000 0.005 0.002 0.003 0.001 0.008 0.004 0.005

0

0.05

0.1

0.15

0.2

0.25

-0.007 -0.005 -0.003 -0.001 0.001 0.003

Freq

uenc

y

Studentized CV

The mean CV values are based on samples of different size. The scores are therefore of different value.

We have to use weighed averages

Page 9: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Monte Carlo simulation.

Page 10: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Null models

Darwin finch

Photo:Guardian Unlimited

Do the beak length of Darwin finches as a measure of resource usage differ more or less than expected just by chance?

0 5 10 15

0 5 10 15

0 5 10 15

0 5 10 15

0 5 10 15

0 5 10 15

The classical method to answer this question is to compare the observed variance in beak length differences with those obtained from a random draw of beak length inside

the observed range (smallest and largest beak size being fixed).

This is a null model approach

We test whether this null model approach is reliable

Page 11: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

We have randomly assigned beak length of 20 species measured in mm

Ordiginal data

Sorted data

Difference

50 120 2 115 9 723 12 341 15 321 17 238 18 118 19 117 20 132 20 024 21 119 23 220 24 112 28 428 32 449 37 5

2 38 19 41 3

37 49 81 50 1

Variance 4.81

Randomized data

Adjusting precision

Sorting DifferenceRandomized

dataAdjusting precision

Sorting Difference

1 1 1 1 1 125.6752138 26 1 0 11.2255116 11 7 648.1121149 48 7 5 9.78997347 10 7 042.1150435 42 14 6 8.95736252 9 8 022.9872128 23 23 8 7.83346076 8 9 132.1307563 32 23 0 23.0002153 23 10 038.3337144 38 25 1 28.576216 29 11 148.6675789 49 26 0 49.8830873 50 11 023.4858483 23 29 3 39.7427775 40 18 624.7944844 25 31 1 19.0851063 19 19 135.5989343 36 32 0 47.943224 48 23 314.0908314 14 36 3 47.5814688 48 23 031.2566943 31 38 2 35.7635977 36 26 229.4842605 29 42 3 7.40672814 7 29 243.5479253 44 43 0 22.6873765 23 36 71.21857593 1 44 0 25.9080369 26 40 37.12456902 7 47 3 10.8669441 11 48 742.6867606 43 48 1 7.26532825 7 48 047.0118747 47 49 0 17.5330692 18 50 1

50 50 50 1 50 50 50 0

Variance 5.39 6.43

1000 randomizations Null model distribution

1 12 13 44 155 896 1657 2568 1999 131

10 8611 4912 4

1000

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12N

umbe

rVariance

P (H0) = 21/1000 = 0.021 The null distribution gives us directly the H0 probability.

Observed variance Randomized variances

Page 12: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Meningitis in Europe Distribution of forests in Europe

Is the probability of Meningitis infection correlated to the distribution of forests in Europe?

We use a grid aproach

We use the corefficient of correlation between the entries of both gridsR = 0.06; P(R=0) > 0.1.

The distance between the sites might be of importance.

Meningitis casesGrid SitesSites 46 33 44 82 18 73

87 83 45 63 54 1665 79 11 24 53 2079 62 67 40 6 89

3 63 8 4 5 54 6 2 8 0 237 3 5 5 5 879 10 7 2 0 448 7 9 3 5 29

60 61 66 13 40 9047 96 92 40 76 3617 36 43 98 10 3123 17 275 149 36 4134 19 943 46 40 6627 95 48 603 6 3

Forest densityGrid SitesSites 83 73 675 193 84 50

72 29 441 479 44 7459 59 5 8 39 3377 37 10 8 66 3055 58 2 1 14 172 46 7 4 79 16

2 45 9 0 38 10061 65 643 876 56 6795 25 772 480 97 3581 52 948 722 92 995 95 23 9 69 3434 53 7 18 86 4414 29 30 1 30 6847 52 11 24 92 8175 17 632 641 66 93

Page 13: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Meningitis in Europe Distribution of forests in Europe

Meningitis casesGrid SitesSites 46 33 44 82 18 73

87 83 45 63 54 1665 79 11 24 53 2079 62 67 40 6 89

3 63 8 4 5 54 6 2 8 0 237 3 5 5 5 879 10 7 2 0 448 7 9 3 5 29

60 61 66 13 40 9047 96 92 40 76 3617 36 43 98 10 3123 17 275 149 36 4134 19 943 46 40 6627 95 48 603 6 3

Forest densityGrid LongitudeLatitude 643 56 67 876 61 65

772 97 35 480 95 25948 92 9 722 81 52

11 92 81 24 47 525 39 33 8 59 59

30 30 68 1 14 297 86 44 18 34 53

441 44 74 479 72 299 38 100 0 2 45

10 66 30 8 77 37632 66 93 641 75 17675 84 50 193 83 73

23 69 34 9 95 952 14 1 1 55 587 79 16 4 72 46

We reshuffle rows and columns only to get the null model distribution.

0

50

100

150

200

250

-0.5

-0.4

5

-0.4

-0.3

5

-0.3

-0.2

5

-0.2

-0.1

5

-0.1

-0.0

5 0

0.05 0.

1

0.15 0.

2

0.25 0.

3

0.35 0.

4

0.45 0.

5

Num

ber

Correlation

P (H0) = 26/1000 = 0.026

Page 14: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Mantel testSequence

Caruabus coriaceus A T T T G C A T G C ACarabus auronitens A G T A A C A G G G ACarabus cancellatus A C G T G C A T C C TCarabus auratus A T A T G C T T G G T

Caruabus coriaceus

Carabus auronitens

Carabus cancellatus

Carabus auratus

Caruabus coriaceus 0 5 4 4Carabus auronitens 5 0 8 7Carabus cancellatus 4 8 0 5Carabus auratus 4 7 5 0

PreyCollembola Diptera Arachnida

Caruabus coriaceus 50 20 30Carabus auronitens 60 10 40Carabus cancellatus 50 25 25Carabus auratus 30 60 10

Caruabus coriaceus

Carabus auronitens

Carabus cancellatus

Carabus auratus

Caruabus coriaceus 0 0.95 0.94 -0.11Carabus auronitens 0.95 0 0.81 -0.68Carabus cancellatus 0.94 0.81 0 -0.11Carabus auratus -0.11 0.68 -0.11 0 Coefficient of correlation between

matrix entries

n n

ij iji 1 j 1

1r Z(1) Z(2)n 1

For convenience we use Z-transformed data

The Mantel test is a test for the correlation between two distance matrices.It tests whether distances are correlated.

Page 15: Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score

Reshuffling of values among matrix entries.