biostatistics ii

89
BIOSTATISTIC S (II)

Upload: mazz4

Post on 27-Jun-2015

345 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Biostatistics ii

BIOSTATISTICS (II)

Page 2: Biostatistics ii

SYLLABUS REQUIREMENTS:Students should know how to work out t-test and Chi Squared test and their interpretation (excluding the expectation of working out standard deviation or other long calculations).

SUPPLEMENTARY NOTE ON STATISTICSThe following are conditions for using various statistical tests.t-Test (Independent samples)1. Interval level data.2. Independent samples3. Populations should be approximately normally distributed.4. Populations should have approximately the same standard deviation.5. Samples contain less than 30 values each.Degrees of freedom (df) for the two samples is the total number of samples minus two.

t-Test (Matched samples)1. Matched paired samples2. Interval level data3. Population of differences should be normally distributed.4. Samples contain less than 30 values.Degrees of freedom = df = (numbers of pairs of values) –1

Chi-Squared Test1. Nominal level data2. The expected frequency should not fall below 5 in more than 20% of the cells.Degrees of freedom = df = (number of columns) – 1

Page 3: Biostatistics ii

Two statistical tests in syllabus:

1. Chi-Squared (2) Test2. t-test:

Independent samples Matched samples

Statistics is the art and science of making sense out of data

Page 4: Biostatistics ii

Let us discover:

When? Why? How?

To use these statistical tests

Page 5: Biostatistics ii

Applied to results obtained from an experiment

E.g. students investigate the effect of UV light on seed germination:

When?

Not irradiated: 8/10 germinated Irradiated: 3/10 germinated

Page 6: Biostatistics ii

are statistical tests applied?

e.g. A student counted the number of visits made by butterflies in 1 h:

To test whether a result occurred by chance or not

6 visits20 visits

100 visits

Can the student conclude that butterflies prefer blue flowers?

Why

Page 7: Biostatistics ii

Result could have occurred by chance

Page 8: Biostatistics ii

Which test?

Chi-Squared (2) Test

t-test

Page 9: Biostatistics ii

Type of data:

Chi-Squared (2) TestCategorical / Nominal level

t-testInterval level

Page 10: Biostatistics ii

Categorical / Nominal Level of Data: from the Latin nomen, meaning 'name' data is grouped under a ‘label’

Woodlice in humid and in dry areas

Blood grouping in people

Choice chamber

Page 11: Biostatistics ii

Examples of Categorical Level of Data:Eye colour

Blue Brown Green Other

Number of people

25 100 55 20

Tree Type Oak Pine OliveNumber of insects

30 48 8

Page 12: Biostatistics ii

Interval Level of Data:accurate measurements of a variable is continuous data has units of measurement e.g. length, weight, temperature

Page 13: Biostatistics ii

t-test : assesses whether the means of two groups

are statistically different from each other e.g. heart beats per minute:

At rest After exercise70 12068 10673 13470 10067 116

Mean: 69.6 Mean: 115.2

Page 14: Biostatistics ii

Two formulae for t-test:

Independent samples

Matched samples

Comparing size of leaves on two positions on the tree.

Comparing amount of sugars in two types of apple.

Page 15: Biostatistics ii

t-test (independent samples): readings are taken on two different:

organisms situations

e.g. Mean height of boys and girls

Mean length of plant stems grown in the

light and in darkness

Page 16: Biostatistics ii

t-test (matched samples):1. One person’s pre-test and post-test score e.g. taking

the time to recognise a picture upside down and normal orientation

2. One person in a group matched to another person in another group e.g. husband and wife identical twins

Page 17: Biostatistics ii

t-test (matched samples):

A student wanted to find out if the area of

moss (cm2) growing on the North and South

facing sides of trees in a local wood, differs.

N S

TreeArea of moss (cm2)

A B C D E F G H I J K L

North side 44 44 46 47 48 50 51 52 52 57 62 67South side 36 39 39 43 49 49 51 54 58 60 61 72

Page 18: Biostatistics ii

e.g. Independent samples:A student wanted to find out whether the

height of 16 year-old males and females differs. She recorded the data in the table below. Height (m)Males 1.70 1.65 1.66 1.85 1.78 1.83Females 1.66 1.58 1.71 1.66 1.59 1.69

Average Height (m)Males 1.75Females 1.65

Page 19: Biostatistics ii
Page 20: Biostatistics ii

What is a

A suggested explanation for an observation

?

Page 21: Biostatistics ii

Hypothesis Testing is a method:

for deciding if an observed effect or result occurs by chance alone

Page 22: Biostatistics ii

Are there more woodlice in humid area:

bytheir characteristic?

Humid9

Dry1

?

Page 23: Biostatistics ii

The Scientif

ic Method

Page 24: Biostatistics ii

To decide if the results of an experiment occur by chance or not, the researcher

declares:

An alternative hypothesis (AH)

A null hypothesis (NH)The hypothesis actually tested

The other hypothesis, assumed true if NH is false

Page 25: Biostatistics ii

The NH states that there will be NO DIFFERENCE between the groups as a

result of the treatment

THE AH indicates there WILL be a difference between the groups

Page 26: Biostatistics ii

How to State the NH & AH:

NH: There is no significant difference between the mean height of girls and boys.

AH: There is a significant difference between the mean height of girls and boys.

Page 27: Biostatistics ii

Write a suitable NH:A researcher wanted to find out whether light intensity has an effect on the rate of photosynthesis in Elodea.

There is no significant difference in the rate of photosynthesis when the light intensity is varied.

OR

Light intensity has no effect on the rate of photosynthesis.

Page 28: Biostatistics ii

Write a suitable NH:A researcher wanted to find out whether alcohol has an effect on memory. He did this by finding out the number of words remembered after drinking water and then again after drinking alcohol.

There is no significant difference between the number of words remembered after drinking water or alcohol.

Alcohol has no effect on memory.

OR

Page 29: Biostatistics ii

To ACCEPT or REJECT the NH:

When NH is ACCEPTED: i.e. there is no difference between the

groups

When NH is REJECTED: i.e. there is a difference between the

groups – treatment made a difference

Page 30: Biostatistics ii

provides a means of making decisions under certainty

Page 31: Biostatistics ii

Whether NH is accepted or rejected is based on whether the results of a

statistical test performed on the results of the experiment is:

or

than a preset level of probability

Page 32: Biostatistics ii

ProbabilityProbability is the scientific way of stating the degree

of confidence we have in predicating something

Suppose a bag contains brown and green marbles and we extract 10:

Thus, we can say that the bag has more brown than green – but we cannot be certain

Page 33: Biostatistics ii

Suppose we extract another 10 marbles and get:

We are now more confident, but how confident would we have to be to satisfy ourselves that there are more brown than green marbles?

Answer is 95% that is 5% chance of being wrong

Page 34: Biostatistics ii

The percentage chosen probability is called, the:

or

Page 35: Biostatistics ii

By convention, the critical probability for rejecting the NH is 5% (i.e. P = 0.05)

Page 36: Biostatistics ii

You are given the critical values from a table and must choose the appropriate one

Level of significance (P)Degrees of freedom (df) 0.05 0.025 0.01 0.005 0.001

1 3.84 5.02 6.63 7.88 10.83

2 5.99 7.38 9.21 10.60 13.81

3 7.81 9.35 11.34 12.84 16.27

Part from a table showing the critical values of 2 test.

Page 37: Biostatistics ii

Degrees of freedom (df) are related to the size of the samples studied formulae depend on the test being used:

df = (number of columns) – 1

Chi-squared test (2 )

Colour of flowerRed Purple Yellow

Number of bee visits 75 51 20

df = 3 – 1 = 2

Page 38: Biostatistics ii

Degrees of freedom (df)

df = (total number of samples) - 2

t-test (independent samples)

Radicle lengths / cmTreatment A Treatment B

4.1 8.18.4 9.09.2 8.16.0 7.86.4 5.35.3 7.74.1 9.8

Mean A = 6.21 Mean B = 7.97

AxBx

df = (7 + 7) – 2df = 14 – 2 = 12

Page 39: Biostatistics ii

Degrees of freedom (df)

df = (number of pairs of values) - 1

t-test (matched samples)

AxBx

df = 8 – 1df = 7

SpecimenA B C D E F G H

Rate of heart beat at 5C

28 30 30 31 32 33 34 36

Rate of heart beat at 10C

39 40 39 45 46 37 47 39

Page 40: Biostatistics ii

OVERVIEW1. How to present a statistical test2. 2 (Chi-squared) test3. t-test

Page 41: Biostatistics ii

The order of writing up a statistical test:-

NH (Null Hypothesis)AH (Alternative Hypothesis)Name of test, including any assumptions

about the populationsLevel of significance (is used to indicate the

chance that we are wrong in rejecting the NH)CalculationsConclusion (accept or reject NH)

Page 42: Biostatistics ii
Page 43: Biostatistics ii

the result is said to be not significant

is less than the

Page 44: Biostatistics ii

the result is due to chance

is less than the

= 0.23

= 2.69

Page 45: Biostatistics ii

the result is said to be statistically significant

is larger than the

Page 46: Biostatistics ii

the result is not due to chance

is larger than the

= 15.87

= 2.69

Page 47: Biostatistics ii

OVERVIEW

1. How to present a statistical test

2. 2 (Chi-squared) test3. t-test

Page 48: Biostatistics ii

Chi-Squared (2) Test

O = observed frequencies /valuesE = expected frequencies / values = the ‘sum of’

E

EO 22

Page 49: Biostatistics ii

Chi-Squared (2) TestO = observed valuesE = expected values

E

EO 22

Checklist Use the 2-test when the following conditions are satisfied

1 Categorical level data (Categorical = nominal)

2 The expected frequency should not fall below 5 in more than 20% of the cells.

Number of degrees of freedom (df) = (number of columns – 1)[Note that columns refer to classes of data]

Page 50: Biostatistics ii

The expected frequency should not fall below 5 in more than 20% of the cells.

Expected 10 3 28 1 45 cell

5 cells = 100%1 cell = 20%

Page 51: Biostatistics ii

The expected frequency should not fall below 5 in more than 20% of the cells.

Expected 10 3 28 1 45

2 cells = 40%

The expected frequency is below 5 in 40% of the cells.

Page 52: Biostatistics ii

Example 1: Comparing categories of a single sample

As part of an investigation into the foraging habits of bees (Bombus monticola), the number of visits made to two types of plant, Vaccinium vitis-idaea and Erica tetralix, were recorded in the table below; these numbers are called the observed frequencies (O).

Type of plantVaccinium vitis-

idaeaErica tetralix

Number of visits(Observed frequencies, O)

75 51

Page 53: Biostatistics ii

Null hypothesis: There is no significant difference in the number of visits to each type of plant.

Alternative hypothesis:

There is a difference in the number of visits to each type of plant.

Page 54: Biostatistics ii

How to calculate the expected values

If the NH is true: expected number of visits to each type of plant = 50% of total.

total number of visits: 75 + 51 = 126No. of visits to V. vitis-idaea: 50% of 126 = 63No. of visits to E. tetralix: 50% of 126 = 63

Page 55: Biostatistics ii

ObservedFrequency

(O)

ExpectedFrequency

(E)

Difference (O - E)

Vaccinium vitis-idaea

75 63 75 - 63 = 12 = 2.29

Erica tetralix

51 63 51 - 63 = -12 = 2.29

E

EO 2

63

6375 2

63

6351 2

E

EO 22 = 2.29 + 2.29 = 4.58

df = (number of classes of data) – 1df = 2 - 1 = 1

Page 56: Biostatistics ii

Critical value (crit2) corresponding to 1 df and a 5%

level of significance is 3.84

Calculated value is 4.45

Reject the NH and accept the AH

CALCULATED VALUE is greater than the critical value, crit

2

Page 57: Biostatistics ii

Conclusion:there is a difference in the number of visits to

the two species of plant

result is not by chanceType of plant

Vaccinium vitis-idaea

Erica tetralix

Number of visits 75 51

Page 58: Biostatistics ii

Example 2: Comparing the Data Obtained from a Genetics Experiment with the Outcome

Predicted using Mendelian Ratios

Important:Apply chi-squared to test outcomes of a genetic cross

Page 59: Biostatistics ii

One tall and one dwarf pure-breeding pea plant were crossed to produce F1 generation plants. Two of these F1 generation plants were crossed to produce F2 generation plants. 300 seeds of these F2 generation plants were grown on, of which 292 survived, comprising 215 tall and 77 dwarf plants.According to Mendelian laws, the ratio of tall to dwarf plants should be 3:1.Use the Chi-squared test (2) with a 5% level of significance to determine if the data is consistent with Mendelian laws, i.e. whether the Mendelian ratio fits the data.

Page 60: Biostatistics ii

NH: Ratio of tall to dwarf plants is 3:1 (data is consistent with Mendelian laws).

NH: There is no significant difference between the data obtained and the Mendelian ratio.

This can be also stated as:

Page 61: Biostatistics ii

AH: Ratio of tall to dwarf plants is not 3:1 (data is not consistent with Mendelian laws).

2192924

3

732924

1

Expected frequency of tall plants:

Expected frequency of

dwarf plants:

Page 62: Biostatistics ii

Observed frequency

Expected frequency

Difference

O E O - E

215 219 215 – 219 = -4

77 73 77 – 73 = 4

E

EO 2

07.0

219

4 2

240.22

73

E

EO 22 = 0.07 + 0.22 = 0.29

df = 2 - 1 = 1 crit

2 = 3.84 at 5% level of significance

NH is accepted as 2 = 0.29 is less than crit2 = 3.84,

i.e. data is consistent with Mendelian laws.

Page 63: Biostatistics ii

EXAMPLES : 2 test

Homozygous recessive pea plants were crossed with heterozygous round peas. 150 offspring were obtained, of which 81 were round and 69 wrinkled. Is this significantly different from the Mendelian ratio of 1:1? [crit

2 = 3.84 (1df, P = 0.05)]

NH: Data is consistent with Mendelian ratio.AH: Data is not consistent with Mendelian ratio.

Expected ratio:

15075

2

Page 64: Biostatistics ii

Phenotype of

offspring

Number of offspring

Observed Expected

Round 81 75 0.48

Wrinkled 69 75 0.48

E

EO 2

2 =

Since the calculated value for 2 is (less than / greater than) the critical value at the 5% level, then the null hypothesis is (rejected / accepted).

[crit2 = 3.84 (1df, P = 0.05)]

0.48 + 0.48 = 0.96

Page 65: Biostatistics ii

2. The gene for coat colour in dogs has an allele for dark coat colour dominant over the allele for albino colour, whilst the gene for hair length has an allele for short hair dominant over the allele for long hair.The ratio for the offspring of phenotypes is 9:3:3:1, assuming the genes are unlinked.Use 2 test to determine whether the data below is consistent with this ratio.

Phenotype Dark / short

Dark / long

Albino / short

Albino / long

Number of offspring

187 56 61 20

[crit2 = 11.34 (3df, P = 0.05)]

Page 66: Biostatistics ii

NH: Data is consistent with Mendelian ratio.

AH: Data is not consistent with Mendelian ratio.

NH: There is no significant difference between the data obtained and the Mendelian ratio.

This can be also stated as:

Page 67: Biostatistics ii

Phenotype of

offspring

Number of offspring

Observed ExpectedHow to get the

expected

Dark / short 187 182.25

Dark / long 56 60.75

Albino / short 61 60.75

Albino / long 20 20.25

Phenotype Dark / short

Dark / long

Albino / short

Albino / long

Total

Number of offspring

187 56 61 20 324

Expected ratio

9 3 3 1 16

9324

16

3324

16

3324

16

1324

16

Page 68: Biostatistics ii

Phenotype of

offspring

Number of offspring

Observed Expected

Dark / short 187 182.25 0.124

Dark / long 56 60.75 0.371

Albino / short 61 60.75 0.001

Albino / long 20 20.25 0.003

E

EO 2

0.124 + 0.371 + 0.001 + 0.003 = 0.4992 =

[crit2 = 11.34 (3df, P = 0.05)]

Since the calculated value for 2 is (less than / greater than) the critical value at the 5% level, then the null hypothesis is (rejected / accepted).

Page 69: Biostatistics ii

3. An investigation to determine whether woodlice prefer dark conditions to light was carried out in a choice chamber. Half of the choice chamber was covered in black paper and the other half left in light.

Ten woodlice were introduced into the choice chamber. The number of woodlice in each side was counted after thirty minutes. The experiment was repeated five times and the results are shown below.

Page 70: Biostatistics ii

Use 2 test to determine whether light affects distribution of woodlice.

[crit2 = 3.84 (1df, P = 0.05)]

NH: There is no significant difference between the number of woodlice in the dark and in the light. [Woodlice distribution is not affected by light].AH: There is a significant difference…………..

Dark 7 9 10 8 6Light 3 1 0 2 4

Page 71: Biostatistics ii

Observed frequency

Expected frequency

Difference

O E O - EDark 40 25 15 9Light 10 25 -15 9

E

EO 2

2 =

Dark 7 9 10 8 6Light 3 1 0 2 4

Since the calculated value for 2 is (less than / greater than) the critical value at the 5% level, then the null hypothesis is (rejected / accepted).

[crit2 = 3.84 (1df, P = 0.05)]

9 + 9 = 18

= 40= 10

Page 72: Biostatistics ii

OVERVIEW1. How to present a statistical test2. 2 (Chi-squared) test

3.t-test

Page 73: Biostatistics ii

Formula for the t-test (independent samples)

is the mean of a set of data T is the mean of a set of data CST is the standard deviation for a set of data T

SC is the standard deviation for a set of data C

nT and nC are the number of samples in sets of data T

and C respectively

C

C

T

T

CT

n

S

n

S

xxt

22

Tx

Cx

Page 74: Biostatistics ii

t-Test (Independent Samples)

Checklist Use the t-test when the following conditions are satisfied

1 Interval level data2 Independent samples3 Populations should be approximately

normally distributed

Page 75: Biostatistics ii

t-Test (Independent Samples)Checklist Use the t-test when the following conditions

are satisfied4 Populations should have approximately the

same standard deviation

Page 76: Biostatistics ii

t-Test (Independent Samples)Checklist Use the t-test when the following conditions

are satisfied

5 Samples contain less than 30 values each

Number of insects onAustrian pine

Number of insects onhorse chestnut

14 04 111 1

11 713 430 141 17 73 27 2

10 values

Page 77: Biostatistics ii

t-Test (Independent Samples)Number of degrees of freedom (df) =

(total number of values in both samples – 2)

Number of insects /cm2 onAustrian pine

Number of insects /cm2 onhorse chestnut

14 04 111 1

11 713 430 141 17 73 27 2

df = 20 - 2 = 18

Page 78: Biostatistics ii

Where:

is the mean of the differencesn is the number of sampless is the standard deviation

Formula for t-Test (Matched Samples)

s

ndt

1

d

Page 79: Biostatistics ii

t-Test (Matched Samples)

Checklist Use the t-test (matched samples) when the following conditions are satisfied

1 Matched (paired) samples

2 Interval level data

3 Population of differences should be approximately normally distributed

4 Samples contain less than 30 values each

Number of degrees of freedom (df) = (Number of pairs of values) – 1

Page 80: Biostatistics ii

What does the t-test do?Tests if the mean values of two groups are

statistically different.

Do insects really prefer the Austrian pine or is result occurring by chance?

Number of insects /cm2 on

Austrian pine

Number of insects /cm2 onhorse chestnut

14 04 111 111 713 430 141 17 73 27 2

13.13.6

Page 81: Biostatistics ii

Horse chestnutAustrian pineMean: 13.1 Mean: 3.6

Page 82: Biostatistics ii

What information is needed to work out the

Page 83: Biostatistics ii

MeanStandard deviation squaredNumber of samples in set

Number in quadrat on

Austrian pine

Number in quadrat on

horse chestnut14 04 111 111 713 430 141 17 73 27 2

= 10 = 10

= 36/10= 3.6

= 131/10= 13.1

Standard deviation is given. SYLLABUS

states that you are not required to work

it out.

Page 84: Biostatistics ii

These three cases help you understand why the standard deviation is important to consider:

the difference between the means is the same in all three BUT

Page 85: Biostatistics ii

spread of values is different.

In which case is it most probable that the mean values of two groups are statistically different?

Page 86: Biostatistics ii

consider area of OVERLAP

Means are statistically different where overlap is least.

Page 87: Biostatistics ii

difference between group means

variability of groups

Page 88: Biostatistics ii

NH: No reaction occurs.

Page 89: Biostatistics ii

NH: REJECTED.