biological statistics - thiacin the dots slope up to the right, then it is and table can be found in...

2
NOTEBOOK contains a miscellaneous collection of items all designed to aid your study of biology. Whether dealing with creatures or concepts, evolution or exams, NOTEBOOK will help, inform and remind you of things that you should find useful. Biological statistics part 3 S o fa r in thi s se ries we have lo oked at types of va ri abl es . distributions. meas ur es of central tendency and vari ab ility, and the Chi- squa red tes t. This part l oo ks at two re maining statisti cal tech niq ues you nee d fo r A-l ev el - the I- test and correlat ion. COMPARING SMALL SAMPLES Th e I-test wa s dev ise d to ove rcome prob- lems of dea lin g with small sample siz es an d it is wid ely use d in a vari e ty of situ a- tio ns. It co mp ar es two sets of sa mpl es by referring to tbe means and th e sample sco res aro und the mean (tbat is, th e stan da rd de viation or va riance). If you ca n cal cula te th e means and va riances of the sampl es th en yo u can use th e I-tes t to compare them , and to determine whe th er they are signifi cantly different - if they come from th e sa me or different popu - la ti ons. Genera ll y. we can use thi s test to compare two sets of sample data, usu a ll y thos e fro m ex perimental and con trol gro up s. It is more acc urate if the sample size is 30 or ov e r. The valu e of I can be calc ula te d from the fo rm ula: In th is fo rm ul a (for samples A and B), is t he mean, 52 is the variance ( th e s um of the sq uar es of the difference s from th e me an di vided by th e number of observations), and 11 is th e number in the samp le. Ha vin g o bt Cl in ed I y ou need to refe r to a ta ble to determ in e it s meanin g. Example: looking at lymphocyte counts A pa rasit ologist wa s tes tin g her hypothes is that th e number of lymphocytes present in the blood of patients in fected by a bl ood par as ite is greate r than th at found in unin fected indi viduals. To make the ca lculati ons in thi s example easier to fo ll ow. onl y the cou nts from the blood of fi ve indi vidu als in each group are con- sid ered. Th e lymphocy te coun ts obtained from equal areas of bl ood on prepared films are shown in T ab le 1. Group A (infected patients) 165, 170, 151 , 164, 160 Group B (un infected individual s) 150, 155, 152, 146, 152 Table 1 (Al th ough co unts are use d. the I-tes t can be used sin ce mea ns and va rian ces of the data ca n be ca lc ulated. ) Ar e these two groups statisti ca ll y diffe rent'J To fi nd out, we first ca lcul ate the means and sta nda rd deviations for the two gro ups (see BIOLOG ICA L SC IE i\C ES Rn' IEW, Vo l. 7, No .4 , pp. 38- 4 1) : Standard Mean deviation Group A 162 7.11 Group B 151 3. 32 Nex t we su bstitu te these va lue s in the forlllu la: x rXB = 162- 151 = 11 (It does not matter if thi s va lu e is nega tive . sin ce it is onl y the difference be twee n the mean s wh ich is importan t.) = -V 123 14 = 35 1 I is equal to 11 di vid ed by 3.51 = 3. 13. The I ta bl e gi vin g p values for different de grees of freed om is shown in Ta ble 2. It concentrates on the lev els of p that dea l with significant differences (tha t is, p<O .05). The nu mber of deg r ees of free - dom is (th e n um ber in samp le A minu s 1) 0.05 plus (th e number in sample B minus 1), th at is: (nA - 1) + (nB- 1) or altern at ively (I1 A + " B )- 2 As wi th th e chi- sq uared te st , the tabl e is entered on th e left at th e appropriate nu mber of deg ree s of free dom . In this ca se it is (5 - \)+( 5- 1) =8. rVlovin g ac ross th e ta ble \-ve find wh ere 3.13 li es - between p = 0.05 a nd p = 0.01, so p<0.05. The differen ce between the two se ts of data is therefore significant and we can be fairly confid ent that th e differenc e is not due to chance bu t to some other factor. Lookin g ba ck at th e mean fig ures (Group A1 62 and Gro up B, 1 5\) we can conclude that the infec ted patients actually have a lymp hoc yte count th at differs from that of the uni nfec ted peopl e. so th e paras ito- log ist's hyp othe sis is su pported by these data . alth ough man y more counts would norma ll y be done. deceasing value of p __ Degrees of pvalues freedom (df) 0.10 0.05 0.01 0.001 1 2 3 4 5 6.3 1 2.92 2. 35 2. 13 2.02 12.71 4.30 3. 18 2.78 2.57 63 .66 9. 92 5.84 4. 60 4.03 636.60 31.60 12.92 8.61 6.87 6 7 8 9 10 1.94 1.89 1.86 1.83 1. 81 2.45 2. 36 2.31 2.26 2.23 3.71 3. 50 3. 36 3.2 5 3.17 5.96 5.41 5. 04 4.78 4.59 12 14 16 18 20 1.78 1.76 1.75 1.73 1.72 2.18 2. 15 2.1 2 2.10 2. 09 3.05 2.98 2.92 2.88 2.85 4.32 4.14 4.02 3.92 . 3.85 22 24 26 28 30 1. 72 1.71 1.71 1.70 1. 70 2. 08 2.06 2.06 2.05 2.04 2.82 2. 80 2.78 2.76 2.75 3.79 3.74 3.71 3.67 3.65 40 1.68 2.02 2.70 3.55 60 1.67 2.00 2.66 3.46 120 1.66 1.98 2.62 3.37 1.64 1.96 2.58 3.29 Table 2 Table of t distribution 0.01 0.001 P is greater than 0.05 (p >0.05) pi s less than 0.05 (p <0.05 ) p is less than 0.01 (p<0.01) pis less than 0.001 (p < 0 .001 ) .. Not significant Significant .. Highly si gni ficant .. Very high ly (NS) ( fa irly confiden t) (very confident) significant (almost certain) MARCH 1997 ------------------------------------------------------------------------------------ 15 i

Upload: buicong

Post on 21-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

NOTEBOOK contains a miscellaneous collection of items all designed to aid your study of biology. Whether dealing with creatures or concepts, evolution or exams, NOTEBOOK will help, inform and remind you of things that you should find useful.

Biological statistics part 3 S o fa r in this se ries we have looked at types of va ri ables . distributions. measures of central tendency and variability, and the Chi-squa red tes t. This part looks at two remaining stat istical tech niq ues you need fo r A-level - the I- test and correlat ion.

COMPARING SMALL SAMPLES The I-test was dev ised to overcome prob­lems of dea ling with small sample sizes and it is widely used in a variety of situa­tions. It compares two sets of samples by referr ing to tbe means and the sample scores aro und the mean (tbat is, the standa rd deviation or va riance). If you can cal cu late the means and va riances of the samples then you can use the I-tes t to compare them , and to determine whether they are significantly different - if they come from the sa me or different popu ­lations. Generally. we can use thi s test to compare two sets of sample data, usuall y those fro m experimental and con trol groups. It is more accurate if the sample size is 30 or over. The value of I can be calculated from the fo rm ula:

In th is form ul a (for samples A and B), is the mean, 52 is the variance (the

sum of the squares of the differences from the mean di vided by the number of observat ions), and 11 is th e number in the sample. Having obtCl ined I you need to refer to a table to determ ine its meaning.

Example: looking at lymphocyte counts A pa rasitologist was tes ting her hypothesis that the number of lymphocytes present in the blood of pat ients infected by a blood paras ite is greate r than that found in unin fected indi viduals. To make the ca lculati ons in thi s example eas ier to follow. onl y the counts from the blood of fi ve individu als in each group are con­sidered. The lymphocyte coun ts obtained from equ al areas of bl ood on prepared films are shown in Tab le 1.

Group A (infected patients) 165, 170, 151 , 164, 160

Group B (un infected individuals) 150, 155, 152, 146, 152

Table 1

(Although counts are used. the I-tes t can be used since means and variances of the data ca n be ca lc ulated. ) Are these two groups statistically diffe rent'J

To fi nd out, we first ca lcul ate the means and standard deviations for the two groups (see BIOLOGICA L SC IE i\CES Rn' IEW, Vo l. 7, No.4 , pp. 38- 41):

Standard Mean deviation

Group A 162 7.11 Group B 151 3.32

Nex t we su bstitu te these va lues in the forlllu la:

xrXB = 162-151 = 11

(It does not matter if this va lue is nega tive . since it is only the difference between the means which is important.)

= -V 12314 =35 1

I is equal to 11 di vid ed by 3.51 = 3. 13.

The I table gi ving p values for different degrees of freed om is shown in Table 2. It concentra tes on the levels of p that deal with significant differences (tha t is, p<O .05). The number of degrees of free­dom is (the num ber in sample A minus 1)

0.05

plus (the number in sample B minus 1), that is:

(nA - 1) + (nB- 1)

or alternat ively

(I1 A + " B) - 2

As with the chi- squared test , the table is entered on the left at the appropria te number of degrees of freedom . In this case it is (5 - \)+(5- 1) =8. rVloving across the ta ble \-ve find where 3.13 lies - between p = 0.05 and p = 0.01, so p<0.05. The difference between the two se ts of data is therefore significant and we can be fairly confid ent that the difference is not due to chance bu t to some other factor. Looking back at the mean fig ures (Group A1 62 and Group B, 15\) we can conclude that the infec ted patients actually have a lymphocyte count that differs from that of the uni nfec ted people. so the paras ito­logist's hypothesis is supported by these data . although many more counts would normall y be done.

deceasing value of p __ Degrees of

pvaluesfreedom (df) 0.10 0.05 0.01 0.001

1 2 3 4 5

6.31 2.92 2. 35 2. 13 2.02

12.71 4.30 3. 18 2.78 2.57

63.66 9.92 5.84 4. 60 4.03

636.60 31.60 12.92 8.61 6.87

6 7 8 9 10

1.94 1.89 1.86 1.83 1.81

2.45 2.36 2.31 2.26 2.23

3.71 3.50 3. 36 3.25 3.17

5.96 5.41 5.04 4.78 4.59

12 14 16 18 20

1.78 1.76 1.75 1.73 1.72

2.18 2. 15 2.1 2 2.10 2.09

3.05 2.98 2.92 2.88 2.85

4.32 4.14 4.02 3.92 . 3.85

22 24 26 28 30

1.72 1.71 1.71 1.70 1.70

2. 08 2.06 2.06 2.05 2.04

2.82 2.80 2.78 2.76 2.75

3.79 3.74 3.71 3.67 3.65

40 1.68 2.02 2.70 3.55

60 1.67 2.00 2.66 3.46

120 1.66 1.98 2.62 3.37

1.64 1.96 2.58 3.29

Table 2 Table of t distribution

0.01 0.001

P is greater than 0.05 (p >0.05)

pis less than 0.05 (p <0.05 )

p is less than 0.01 (p<0.01)

pis less than 0.001 (p <0.001 ) ..

Not significant •Significant ..

Highly significant ..

Very highly (NS) (fairly confident) (very confident) significant

(almost certain)

MARCH 1997 ------------------------------------------------------------------------------------ 15

i

-

•• • • • • • • • • • ••• • • • • • • • • • • • • • • • • • • • •

Self-test question I The closer the dots come to lying on a I t is possi ble to determine if a particular (a) If df = 6 and t = 2.94, what wou ld be straight line, the closer the relationship correlation is significant or not, bu t the

the value of p? between the variables; the mOre scattered equation is quite complicated and you would (b) If df = 30 and t = 4.57 , what would be the dots the less close is the re lat ionsh ip. normally use a computer programme to

the value of p? Such a relationship is known as correlation. compute this for you. Details of the form ul a (c) What do these values of p mean? If the dots slope up to the right, then it is and table can be found in stat istics texts.

a positive correlation; if they slope down Many statistics books and courses include to the right, then it is a negative correla­ Self-test question 3 use of what is knQ\.vn as the null hypothesis. tion. If the dots do not seem to have any A student wa nted to find out if there was In my opinion this on ly causes confusion pattern then there is no correlation (see any rela tionship between the length and and it is better to forget about it . However, Figure 1). breadth of privet leaves. He measured 20 if it is required for your course, details If the pa ttern of dots indica tes a pos­ leaves carefully, making sure that he can be fo und in Garvi n ( 1986) pages 9 1 sible relationshi p between the variables recorded the length and bread th for each and 93 (see Further Reading). then we can draw a line through the dots leaf. His resul ts were as fo llows:

which most closely fits all the poin ts ­Length Breadth Length Breadthsuch a line is known as the regression linePOSSIBLY RELATED VARIABLES Leaf (mm) (mm) Leaf (mm) (mm)or the line of best fit. There are a number

Often in biology questions arise concern­ of ways of drawing this line of best fit, but 1 53 24 11 41 19 ing the relationsl1ip between two or more the simplest are: 2 31 16 12 40 21 interdependent variables, where a change 3 42 19 13 38 16

(1) Place a transparent ruler on the scat­ 4 46 26 14 55 24in one characterist ic is matched by a tergraph and move it around until you 5 33 15 15 35 18change in another one. Exam ples are length think that the edge of the ruler passes 6 39 21 16 35 19and width (longe r seeels are wider), or size through the middle of the dots. Then 7 27 16 17 27 15and mass (larger shells are heavier). There draw a straight line through them. 8 33 16 18 22 13are two questions that need to be answered (2) Draw two parallel lines which enclose 9 51 22 19 21 16regarding such an interdependent rela­ most, if not all , of the dots. Then draw 10 32 20 20 49 22

tionship between variables: a straight line equidistant between your (a) Plot these data as a scattergraph. • How close is the relat ionship (correla ­ two parallel lines . (b) Is the correlation positi ve , nega tive, ortion )?

is there no apparent correlation? • What is the form of the relationship Self-test question 2

(c) Draw, if poss ib le, a line of best fit(regression )? Using the sca ttergraph given in Figure 2, through the points.use methods 1 and 2 to obtain lines of bestWe can answer these questions by plotting fit. Are they the same? the data regarding the two variables as a Answers to these questions can be found

scattergraph, in which each axis is occupied on p. 36. by one of the two variables, and each pair Hopefully, after reading parts 1-3 ofof values acts like the coordinates of a point 'Biological stati st ics' and having answered on the grid and is represented by a dot. the ques tions, statistics will be much more This resu lts in dots spread over the graph meaningful to you than when you started . grid . 'X's or circles surrounding the dots are You should be confident when tackling only used when you intend to join the points examination questions involving elements up with a line. The points are not joined of sta ti stics, and when analysing your data up by a line since each example is qu ite from project wo rk . • di stinct and separate. Exa mination of the distribution of the dots can te ll us if there is any relationship between the va ri ables, FURTHER READING

Figure 2how close it is and what form it takes. Garvin, W (1996) 'Biological statistics ­These two method s are straightforward part 2', Biological Sciences Relliew,

•• •••• • • • ••• •• •

Positive corre lation

• •• • • • •••

•• • No correlation

Figure 1

• •. :.• • • ·.:.

• Negative correlation

and qu ick. However, if you want a more Vol. 9, No.3 , pp. 7-9. accurate regression line you can improve its Garvin, W. (1995) 'Biological stati stics ­accuracy as follows: part 1', Biological Sciences Relliew,

Vol. 7, No.4, pp. 38-41. (3) Ca lcu late the means of each of the two Garvin, I.W. (1986) Skills in Advanced

variab les - X and Y. Using these Biology, 1101. 1: Dealing wilh Data, va lu es as coordinates, plot the point Stanley Thomes . X, Y with a distinct cross. This cross Rownt ree, D. (1981) Sta.ti slics 'vvi/houlshould lie on your line of best fit if you Tears, Penguin Books. have drawn it accurately. You ca n adjust the line slightly if it is not , Wilbert Garvin rotating it about this point . Wilbert Garvin is a Lecturer in Education

(Biosciences) at the Queen 's University There are more complex methods of Belfast. He is Director of the Northern obta ining the regress ion line - see Further Ireland Centre for School Biosciences and Reading. Once you have obtained a regres­ author of the Skills in Advanced Biology sion line, you can, if you wish, develop an series of books. His main area of research equation linking the two variables. is biotechnology education.

16 ---------------- -------------------- BIOLOGICAL SCIENCES REVIEW