bio-brainstorm.wikispaces.com€¦ · web viewstatistics and ib biology syllabus a word of ......

22
Click4Biology: Statistical Analysis OCC | LabBanks | StudentBlog | TeacherBlog | Audio | Reading | Brights | Edge | EOL Topic 1: Statistical analysis About software and calculators Excel 2003 Toolkit package Excel 2007 Toolkit package Graphic display calculator Statistics and IB Biology syllabus a word of caution! Data source for calculations. 1.1.1 Error bars and the representation of variability in data. 1.1.2 Calculation of mean and the standard deviation of the sample data. 1.1.3 Standard deviation and the spread of data. 1.1.4 Comparison of two sets of data using their means and standard deviation. 1.1.5 Comparison of two sets of sample data using the t-test. a. Comparison of two samples. b. Animation of two sample comparisons. c. T Test in Excel 2007. d . Drawing conclusions with statistical tests 1.1.6 Correlation, causation and the calculation of correlation coefficients

Upload: buihanh

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

Click4Biology: Statistical Analysis OCC | LabBanks | StudentBlog | TeacherBlog | Audio | Reading | Brights | Edge| EOL

Topic 1: Statistical analysis

About software and calculators

Excel 2003 Toolkit package

Excel 2007 Toolkit package

Graphic display calculator

Statistics and IB Biology syllabus a word of caution!

Data source for calculations.

1.1.1 Error bars and the representation of variability in data.

1.1.2 Calculation of mean and the standard deviation of the sample data.

1.1.3 Standard deviation and the spread of data.

1.1.4 Comparison of two sets of data using their means and standard deviation.

1.1.5 Comparison of two sets of sample data using the t-test.

a. Comparison of two samples.b. Animation of two sample comparisons.c. T Test in Excel 2007.d . Drawing conclusions with statistical tests

1.1.6 Correlation, causation and the calculation of correlation coefficients

 

 

 

Home 01. Statistical Analysis 02. Cells 03. Chemistry of life

Page 2: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

04. Genetics 05. Ecology & Evolution 06. Human Physiology 07. Proteins & Nucleic Acids 08. Respiration & Photosynthesis 09. Plant Science 10. Genetics 11. Human Health A. Human Nutrition B. Physiology of exercise C. Cells and Energy D. Evolution E. Neurobiology & behaviour F. Microbes & Biotechnology G. Ecology & Conservation H. Further Human Physiology Theory of Knowledge

Additional Information

about us

contact us

site map disclaimer

Links

UNESCO Bioethics BEEP Patana Science Pages Bio Links Shambles

 

Statistic and the IB Biology syllabus:

General Notes:

EXCEL 2003

The statistical analysis of data will be illustrated with a worked example using Excel 2003 & 2007. Links to web pages with tutorials on the use of Microsoft Excel have been included. To obtain full use of the statistical functions of Excel requires the Data Analysis package. This can

Page 3: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

be added to your TOOLs menu using the 'Add in' l ink. Note that this is not a Download but is already in your Excel package waiting to be an 'Add in'. Installation instructions Excel 2003. Those of you with the Office 2007 will find Excel looking a little different but the ToolPack2007 can also be installed.

GRAPHIC CALCULATOR (TI 84 PLUS, TI 83 PLUS)

Alternatively students can use their graphic calculators to generate the statistic covered in this topic. Graphic calculator routines are based on the TI 83plus and the TI 84 plus. The great advantage of the graphic calculator is all IB Diploma students will have one for their mathematics course. The calculator can be easily used in the laboratory and also in the field for immediate feedback on your experiments. Try to build these routines into your practical work..

As a new addition to the syllabus this section of the site should be treated with some caution. I have tried as far as possible to adhere closely the syllabus statements. However there areas and questions that remain unanswered about the interpretation of the syllabus statements. These will of course become clearer as teachers attend workshops and through the discussions on the OCC forum. Until then please check your syllabus assessment statements and cross check with your subject advisor.

Examples:

1.1.1 Error bars

Students can use maximum and minimum values in a data range as error bars. Students can use the mean +/- 1 standard deviation as error bars. With a continuous variable as the independent variable it is possible to plot error bars

around a mean value. This will improve the drawing of a line of best fit. With a discontinuous variable on the x-axis then the error bars will assist determining

what can conclusions might be draw from plotted data.

1.1.2 Calculating standard deviation:

In biology experiments we normally are working with a sample of the population. Therefore we calculate the sample standard deviation. I have read the information in the Microsoft Excel support pages that suggest that the correct form of standard deviation test to use is STDEV. One of the alternatives is to use the form STDEVP but this is for the population not the sample. The 'Teacher notes' state: 'Students should specify the standard deviation (s), not the population standard deviation'.

1.1.5 Confusing syllabus statement aside I refer here to aim 7 pg 45:

''Students could be shown how to calculate such values using a spreadsheet program or the graphic display calculator.

The vast majority of schools and students will of course be using Excel 2003 or perhaps Excel 2007. In 'out of the box' mode Excel does not calculate t. Excel goes one step

Page 4: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

further than that and calculates the significance level of P. To my mind this is an excellent feature an allows the student to concentrate on the concept rather than the detail. An immediate decision can be made about the adoption or rejection of the null hypothesis.

However If you want your students to calculate the actual t values then you will have to use an 'add-in'. This is not a download but a feature that will need to be enabled. Once this has been done it is quite straight forward to calculate t. In this way you can still have the students compare their t value to critical values of t to determine the level of significance.

Graphic Display calculator: Since all diploma students will need a calculator for their course this is by far the most convenient way in which to perform the calculations required by the course. Since all my own students currently use the Ti-83/84+ the instructions are based on that calculator.

 

 

An investigation of shell length variation in a mollusc species.

The following scenario has been used to generate the data used in the demonstration of the statistical test of the course.

A marine gastropod (Thersites bipartita) has been sampled from two different locations:

 

Sample A: Shells found in full marine conditions

Sample B: Shells found in brackish water conditions.

In each case the sample size was ten shells and the length of the shell was measured as shown in the illustration. The data obtained form the two locations will be used to illustrate the statistical calculations required by the syllabus.

For the purposes of this demonstration the design and method of data

collection are assumed to be appropriate.

 

 

Page 5: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

Analysis of gastropod data:

The data was obtained by measuring the height of the shells using a ruler. Measurement are in mm with an error of + or - 1 mm.

Significant digits in the data and the uncertainty in the data must be consistent. This applies to all measuring devices, for example, digital meters, stopwatches, and so on. The number of significant digits should reflect the precision of the measurement.

There should be no variation in the precision of raw data. For example, the same number of decimal places should be used. For data derived from processing raw data (for example, means), the level of precision should be consistent with that of the raw data

 

1.1.1 Error bars and the representation of variability in data.

Biological systems are subject to a genetic program and environmental variation. Consequently when we collect a set of data for a given variable it shows variation. When displaying data in graphical formats we can show the variation using error bars.

Error bars can be used to show either the range of the data or the standard deviation.

Mean with the full data range:

The data can be represented on a graph that might show the mean and the full range of data.

Marine population: mean= 30.7, Range = 23-43                Brackish population: mean = 38.2, Range = 32-51

 

Page 6: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

                  

 

 

 

 

 

 

 

 

 

 

Plotting the mean and the range allows for a quick comparison of the two sets of sample data.

 

1.1.2 Calculation of mean and the standard deviation of the sample data

Data collected from an experiment falls into three categories

Page 7: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

Mean:

The arithmetic mean or average is a measure of the central tendency(middle value) of the data. Caution should be used as the distribution maybe skewed and the mean may in fact not be the middle value. The EXCEL formula is given above.

Be careful that the data type you have (check table) allows you to calculate the mean. It may be that the median or the mode are more appropriate.

Standard Deviation (s):

Measure of the spread of data around the mean. Can be used either as a measure of variation within a data set or of the accuracy of a

measurement.

The standard deviation of the sample = s

The standard deviation calculated is for the sample not the total population which could of course have a smaller or larger standard deviation (see the note below).

The image shows the calculation using the Excel spreadsheet.

The standard deviation calculated is a measure of the spread of the data values around the mean.

Population 1. Mean = 31.4 Standard deviation(s)= 5.7

Population 2. Mean =41.6 Standard deviation(s) = 4.3

 

Page 8: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

 

 

 

STDEV or STDEVA in Excel

'Microsoft recommends that you use STDEV instead of STDEVA unless you are sure that you want TRUE, FALSE, and the text strings to be interpreted as the STDEVA function interprets them. Most of the data that you want to calculate a population standard deviation for is completely numeric; in those cases, STDEV is appropriate.'

STDEV is the appropriate formula for the calculation of the sample standard deviation rather than the population standard deviation which is STDEVP.

Microsoft support Excel STDEV

Graphing the mean and the standard deviation.

One way to represent our data is to draw a graph that includes error bars of the standard deviation. The diagram below was drawn by hand but it is possible to plot the SD as error bars in Excel

 

Here each sample has the mean +/- 1 standard deviation.

There is no overlap in the distributions for shell length between these two populations.

The question being considered is:

Is there a significant difference between the two samples from different locations?

or

Are the differences in the two samples just due to chance

selection?

Page 9: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

 

 

 

 

Comparison of graphs:

 

 

 

 

 

 

 

 

 

 

 

Note that the Standard deviation graph has removed the extremes of variation The standard deviation graph compares 68% of the population and begins to show that

they look different. The range graph with its extreme values misleads us to think the data maybe similar

 

 

1.1.3 Standard deviation and the spread of values around the mean.

1. Standard deviation is a measure of how spread out the data values are from the mean. 2. It is assumed that there is a normal distribution of values around the mean and that the

data is not skewed to either end.

Page 10: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

3. 68% of all the data values in a sample can be found between the mean +/- 1 standard deviation..

4. 95% of all the data values in a sample can be found between the mean + 2s and the mean -2s.

 

Page 11: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

1.1.4 Comparing means and standard deviation of two or more samples.

 

A sample with a small standard deviation suggest narrow variation but a second sample with a larger standard deviation suggests wider variation.

Example: The two mollusc samples above provide

Population 1. Mean = 31.4 Standard deviation(s)= 5.7

Population 2. Mean =41.6 Standard deviation(s) = 4.3

The second population has a greater mean shell length but slightly narrower variation. Why this is the case would require further observation and experiment on environmental and genetic factors.

 

1.1.5 Comparison of two samples/ t-Test

Page 12: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

Comparison for two samples:

In the introduction to this topic we considered the sampling of the same species of mollusc from two different locations. We have already calculated the means and the standard deviation for these sample.

Population 1. Mean = 31.4 Standard deviation(s)= 5.7

Population 2. Mean =41.6 Standard deviation(s) = 4.3

The question that we might now ask is:

Null Hypothesis:Is there no significant difference between the two samples except as caused by chance selection of data.

OR

Alternative hypothesis:Is there a significant difference between the height of shells in sample A and sample B.

Animation of the question 'Is there a significant difference between these two populations?

 

 

Statistical test of difference using the t-Test.

The method described here is for Excel 2007 which calculates the the critical P value. Alternatively calculate t itself in Excel 2003 0r Excel 2007

Page 13: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

 

T-Test Calculation : Excel 2007 (calculating P)

 

o In Excel 2007 the TTEST to calculate P is accessed by following the routine provided to the left.

o Note that his directly calculates P and not t STAT

o After step 5 a dialog box opens (see below).

 

 

 

 

 

 

 

 

Enter the setting as provided:

In Excel 2003 the t test is performed using the formula: = TTEST (range1, range2, tails, type) .

For the examples you'll use in biology, tails is always 2 , and type can be:

1, paired2,Two sample equal variance

3, Two samples unequal variance

Page 14: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

 

 

 

 

 

The cell with the t test P can be formatted as a percentage (Format menu > cell > number tab > percentage).

This automatically multiplies the value by 100 and adds the % sign. This can make P values easier to read and understand. It's also a good idea to plot the means as a bar chart with error bars to show the difference graphically.

 

 

 

 

 

 

 

Background to the T-test

A common form of data analysis is to compare two sets of data to see if they are the same or different. For example are the mollusc shells from the two locations significantly different? If the means of the two sets are very different, then it is easy to decide, but often the means are quite close and it is difficult to judge whether the two sets are the same or are significantly different. To compare two sets of data use the t test , which tells you the probability (P) that the two sets are basically the same. This is called the null hypothesis .

Page 15: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

P varies from 0 (not likely) to 1 (certain). The higher the probability, the more likely it is that the two sets are the same, and that any differences are just due to random chance. The lower the probability, the more likely it is that that the two sets are significantly different, and that any differences are real. Where do you draw the line between these two conclusions?

In biology the critical probability is usually taken as 0.05 (or 5%). This may seem very low, but it reflects the facts that biology experiments are expected to produce quite varied results.

If P  > 5% then the two sets are the same (i.e. accept the null hypothesis).

If P  < 5% then the two sets are different (i.e. reject the null hypothesis).

For the t test to work, the number of repeats should be as large as possible, and certainly > 5.

 

Drawing conclusions:

1. State the null hypothesis and the alternative hypothesis based on your research question.

Null Hypothesis: 'There is no significant difference between the height of shells in sample A and sample B.' Alternative Hypothesis: 'There is a significant difference between the height of shells in sample A and sample B'.

2. Set the critical P level at P= 0.05 (5%)

3. Write the decision rule for rejecting the null hypothesis.

If P  > 5% then the two sets are the same (i.e. accept the null hypothesis).

If P  < 5% then the two sets are different (i.e. reject the null hypothesis).

4. Write a summary statement based on the decision.

The null hypothesis is rejected since calculated P = 0.003 < P =0.05 two-tailed test

5. Write a statement of results in standard English.

There is a significant difference between the height of shells in sample A and sample B.

 

1.1.6 Correlation, causation and the calculation of correlation coefficients

Page 16: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

When analysing an experiment you are very often looking for an association between variables. This can be a correlation to see if two variable vary together, or a relation to see how one variable affects another.

There are two tests for correlation: the Pearson correlation coefficient ( r ), and Spearman's rank-order correlation coefficient ( r s ). These both vary from +1 (perfect correlation) through 0 (no correlation) to –1 (perfect negative correlation). If your data are continuous and normally-distributed use Pearson, otherwise use Spearman. In Excel r is calculated using the formula: = CORREL (X range, Y range) .

In Excel r is calculated using the formula: = CORREL (X range, Y range) .

To calculate r s , first make two new columns showing the ranks (or order) of the X and Y data (either by hand or using Excel's =RANK command), and then calculate the Pearson correlation on the rank data.

It is usual to draw a scatter graph of the data whenever a correlation is being investigated.

In the illustrated example the size of breeding pairs of penguins was measured to see if there was correlation between the sizes of the two sexes. The scatter graph and both correlation coefficients clearly indicate a strong

positive correlation. In other words large females do pair with large males. Of course this doesn't say why, but it shows there is a correlation to investigate further.

 

 

 

 

 

Page 17: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered

If you know that one variable causes the changes in the other variable, then you can use linear regression to investigate the relation. This fits a straight line to the data, and gives the values of the slope and intercept of that line (m and c in the equation y = mx + c).

The simplest way to do this in Excel is to plot a scatter graph of the data and use the trend line feature of the graph.

Right-click on a data point on the graph, select Add Trend line, and choose Linear.

Click on the Options tab, and select Display equation on chart. You can also choose to set the intercept to be zero (or some other value). The full equation with the slope and intercept values are now shown on the chart.

 

 

Causation

It is important to realize that if the statistical analysis of data indicates a correlation between the independent and dependent variable this does not prove any causation. Only further investigation will reveal the causal effect between the two variables.

Correlation does not imply causation. Here are some unusual examples of correlation but not causation's !

Ice cream sales and the number of shark attacks on swimmers are correlated. Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt

lengths get shorter). The number of cavities in elementary school children and vocabulary size have a

strong positive correlation.

Clearly there is no real interaction between the factors involved simply a co-incidence of the data.

Once a correlation between two factors has been established from experimental data it would be necessary to advance the research to determine what the causal relationship might be.

 

Page 18: bio-brainstorm.wikispaces.com€¦ · Web viewStatistics and IB Biology syllabus a word of ... Genetics. 11. Human ... their graphic calculators to generate the statistic covered