statistics for ess

16
Statistical analysis for Environmental Systems and Societies Statistical analysis 1 State that error bars are a graphical representation of the variability of data. 2 Calculate the mean and standard deviation (SD) of a set of values. 3 State that the term standard deviation is used to summarize the spread of values around the mean, and that 68% of the values fall within one standard deviation of the mean. 4 Explain how the standard deviation is useful for comparing the means and the spread of data between two or more samples. 5 Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables. 6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables. Statistical analysis – Keywords Arithmetic mean Uncertainty Correlation Error bars Relationship Significance Spread Standard deviation t-test Value Variability Variable

Upload: michael-smith

Post on 14-Jun-2015

164 views

Category:

Education


0 download

DESCRIPTION

IB Environmental Systems and Societies

TRANSCRIPT

Page 1: Statistics for ess

Statistical analysis for Environmental Systems and Societies

Statistical analysis1 State that error bars are a graphical representation of the variability of data.2 Calculate the mean and standard deviation (SD) of a set of values.

3State that the term standard deviation is used to summarize the spread of values around the mean, and that 68% of the values fall within one standard deviation of the mean.

4Explain how the standard deviation is useful for comparing the means and the spread of data between two or more samples.

5Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables.

6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables.

Statistical analysis – Keywords

Arithmetic mean Uncertainty Correlation Error bars

Relationship Significance Spread Standard deviation

t-test Value Variability Variable

Page 2: Statistics for ess

Describing variation mathematically:

Living things can vary so that even two peas in a pod show a variety of sizes and shapes. This raises a number of questions. How can we describe the range of variation? Which pea size is the most common? Can we sort the peas into groups to decide if they came from the same or different pods? Biologists ask these types of questions not only about living organisms but also about sets of data from experiments.

The Arithmetic Mean:

A group of ten students were tested for shoe size. The results are listed here:

Group A 5 6 8 7 8 6 7 7 9 7

The arithmetic mean is the total divided by the number of results, so:

Total = 70Number of results = 10

Mean =7010

= 7

Group A 5 6 8 7 8 6 7 7 9 9Group B 7 7 7 7 7 7 7 7 7 7Group C 5 6 6 6 7 7 7 7 8 9Group D 5 5 5 5 8 8 8 8 9 9

Most people if asked to summarise each set of data above would probably come up with the idea of using the mean. If asked what other information would be useful, can you think of anything?

You may have suggested that a measure of the ‘spread’ of data would be useful. A very simple way to do this is to simply record the range. Can you complete the information below to describe each set of data:

Group Mean ‘Spread’ of Data

A

B

C

D

7

7

7

7

7

Data ranges from 5 to 9

Can you see any problems in only using this to describe a group of data?

Page 3: Statistics for ess

If you were a shoe manufacturer you would find this useful as you would know that it would be a good idea to make plenty of shoes at size 7. However, they do not know how wide the variation is around the mean. All of the distributions on the next page have means of 7, but they clearly need very different outputs from the shoe factory.

Standard Deviation

Continuous data shows a smooth transition of values across a spectrum. So, weight, height and numbers of plants in a particular area are all good examples. To describe the spread of results in continuous data, biologists use a statistic called the standard deviation.

The standard deviation of a set of data is calculated by calculating the deviation of each measurement from the mean.

Which group is very tightly packed around 7 – a shoe manufacturers dream?_____________This group should have the smallest standard deviation.

If data is clustered around the mean you would expect to have lots of small deviations away from the mean. If the data is more spread out you would expect the deviations to be bigger.

Page 4: Statistics for ess

Calculating Standard Deviation (What are the steps involved and what does it mean?)

The heights in the groups of students checked for shoe size were recorded. The data shows a typical distribution and the mean can easily be calculated.

Heights 157 160 161 164 171 172 175 176 177 182

Work out the total and the mean for this set of data:

The heights of individual members of the group are different from the mean.These differences can be calculated.

Height 157 160 161 164 171 172 175 176 177 182

Difference fromthe mean

Since some of the individuals fall below the average some of the differences will be negative. To convert all these values into positive numbers they are squared.

Height 157 160 161 164 171 172 175 176 177 182

Squared differences

The figures above give a measure of the deviation of the individuals from the mean. The standard deviation is the mean deviation. So you will need to find the mean of these values:

Total = 642.5 (explain what numbers were used to calculate this)

Mean = 64.25(explain how this number was obtained)

Since this is the mean of the squares of the original deviation we use the square root of the mean and call it standard deviation.

Page 5: Statistics for ess

Standard deviation = √64.25 = 8.02

The standard deviation is a useful way to describe the variability in a set of continuous data. The larger the standard deviation the larger the spread of data is around the mean.

Questions

The table below shows the heights of two groups of IB Biology students.

Group A heights / cm

180 176 160 169 172 178 182 177 175

Group B heights / cm

180 177 163 166 175 177 180 179 173 169

a. Calculate the mean for each set of students.

b. Calculate the standard deviation for each set of students.

HeightsGroup / A

180 176 160 169 172 178 182 177 175

Differencefrom meanSquare ofdifferences

HeightsGroup / B

180 177 163 166 175 177 180 179 173 169

Differencefrom meanSquare ofdifferences

Total of squared differences for group A:

Total of squared differences for group B:

Standard deviation for group A = √Standard deviation for group B = √ =

Page 6: Statistics for ess

Continuous data shows a smooth transition of values across a spectrum. So, weight, height and numbers of plants in a particular area are all good examples. To describe the spread of results in continuous data, biologists use a statistic called the standard deviation.

Calculating Standard Deviation (What are the steps involved and what does it mean?)

The heights in the groups of students checked for shoe size were recorded. The data shows a typical distribution and the mean can easily be calculated.

Heights 157 160 161 164 171 172 175 176 177 182

Work out the total and the mean for this set of data:

The individual members of the group are different from the mean.These differences can be calculated.

Height 157 160 161 164 171 172 175 176 177 182

Difference fromthe mean

Since some of the individuals fall below the average some of the differences will be negative. To convert all these values into positive numbers they are squared.

Height 157 160 161 164 171 172 175 176 177 182

Squared differences

The figures above give a measure of the deviation of the individuals from the mean. The standard deviation is the mean deviation. So you will need to find the mean of these values:

Total = 642.5 (explain what numbers were used to calculate this)

Mean = 64.25(explain how this number was obtained)

Page 7: Statistics for ess

Mean A

Mean B

Using Excel to Calculate Average and Standard Deviation

You can use Excel to calculate the mean and standard deviation of a set of data for you. This is especially helpful when you quickly want to produce useful data to put onto a graph.

The command to calculate average (on excel software using English) is:

=AVERAGE (*)

Where * represents the dataset of interest

The command to calculate standard deviation (on excel software using English) is:

=STDEVP(*)

Where * represents the dataset of interest

One way to take into account the variability in the results and hence their level of accuracy is to draw error bars.

A simple way to construct an error bar is to use the maximum deviation of a single data point away from the mean.

When drawing a graph an error bar is drawn above and below the mean that shows the maximum deviation away from the mean.

Error bars can be constructed for each mean value:

If the error bars overlap then it cannot be concluded that the values are truly different. We state that the values are not significantly different.

Page 8: Statistics for ess

If the error bars do not overlap then a conclusion that they are significantly different is justified.

Standard deviation error bars are more sophisticated indicator of the precision of a set of measurements. Standard deviation error bars are usually drawn for 1 standard deviation above and below the mean. Excel can do this for you.

If standard deviation is calculated for a set of data you will need a minimum of five repeats (and preferably seven or more).

The student t test is a statistical test. One of the most common applications of statistics is to compare two sets of data, for example the heights of males and females in a class. These heights can be represented as a frequency histogram using the same x axis for both sets of data.

If almost all the male students were taller than the female students then the two histograms would show very little overlap, as shown below in graph (a). From looking at this graph we would be confident in saying that the male students are taller than the female students.

Fig 1: Comparing two sets of data. The triangle indicates the mean value for each set of data.

As the overlap increases it becomes less certain that there is a difference. If the data looked like that shown in graph (b) above where there is almost complete overlap, then we would be confident in saying that there is no difference in the height of male and female students.

It may appear from the graphs above that the difference between the mean values should be a sufficient measure of overlap, i.e. as the means become closer the overlap increases. However, the overlap between the two sets of data also depends on how closely the data are clustered around the two means.

Page 9: Statistics for ess

Look at the two graphs below:

You should notice that the difference between the means is the same.However, the data used to plot graph (b) is more variable there is more overlap, and less certainty that there is a difference between the data.

The T test is a technique which will take into account the means as well as the amount of overlap between two sets of data and say how certain we are that there is a significant difference.

The t-Test

What does the t-Test tell us?

It provides a way of measuring the overlap between two sets of data.

Notation__

X1 is the mean value for data set 1 Vertical lines indicate that the positive difference between the means should be taken, irrespective of which is bigger

S is the symbol for standard deviation n is the number of measurements collected

** Note you will not be expected to remember this formula

Page 10: Statistics for ess

If two sets of data have widely separated means and small variances (the data is clustered around the mean) they will have little overlap and a big value of t, they can be shown to be significantly different.

On the other hand if two sets of data have means that are close together and large variances (the data is spread from the mean) they will have a large overlap and a small value of t, they cannot be shown to be significantly different.

A large value of t indicates little overlap and a significant difference.A small value of t indicates a lot of overlap and no significant difference.

To judge whether the value of t is big or small you have to consult a table known as ‘A Table of Critical Values’. The value that should be looked at in the table depends on something known as ‘The Degrees of Freedom’. An example of a part of a ‘Table of Critical Values’ is shown below:

Page 11: Statistics for ess

Degrees of Freedom Significance levelsp = 0.05 p = 0.01

15 2.13 2.9416 2.12 2.9217 2.11 2.9018 2.10 2.8819 2.09 2.8620 2.09 2.8521 2.08 2.8322 2.07 2.8223 2.07 2.8124 2.06 2.8025 2.06 2.8030 2.04 2.7540 2.00 2.7060 2.00 2.66

To work out the degrees of freedom:

Degrees of freedom = number of classes – 1

So if there were 21 individuals in each sample then the degrees of freedom would equal:

Degrees of freedom = (21-1) + (21-1)Degrees of freedom = 40

Imagine carrying out a t test to compare two sets of data with 21 samples in each set and a value of t was calculated and t = 3.42.

Looking at the table, the critical value at for t at the 0.05 level (Scientists usually always look at this level) and with 40 degrees of freedom is 2.00. This means the probability of getting a value of t at least as large or larger than 2.00 by chance is less than 0.05 (5%). So it is extremely unlikely that the difference in the two sets of data could have arisen by chance. Therefore the two sets of data are significantly different. In fact 3.42 is also bigger than the value at 0.01 (1%) which means that the probability of getting a value of t at least as large or larger than 2.70 by chance is less than 0.01 (1%).

In investigations that will be analysed using statistical tests scientists usually make a null hypothesis. The null hypothesis usually states that there is no significant difference between two samples.

If a value of t is greater than or equal to the critical value then the null hypothesis can be rejected and it can be stated that there is a significant difference.

Page 12: Statistics for ess