statistical analysis topic 1. statistics 1.1.1 state that error bars are a graphical representation...
TRANSCRIPT
Statistical AnalysisStatistical AnalysisTopic 1Topic 1
StatisticsStatistics
1.1.1 State that error bars are a graphical representation of the variability of data.
1.1.2 Calculate the mean and standard deviation of a set of values.
1.1.3 State that the term standard deviation is used to summarize the spread of values around the mean, and that 68% of values fall within one standard deviation of the mean.
1.1.1 State that error bars are a graphical representation of the variability of data.
1.1.2 Calculate the mean and standard deviation of a set of values.
1.1.3 State that the term standard deviation is used to summarize the spread of values around the mean, and that 68% of values fall within one standard deviation of the mean.
1.1.4 Explain how the standard deviation is useful for comparing the means and spread of data between two or more samples.
1.1.5 Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables.
1.1.6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables.
1.1.4 Explain how the standard deviation is useful for comparing the means and spread of data between two or more samples.
1.1.5 Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables.
1.1.6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables.
What is data?What is data?
Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for making calculations or drawing conclusions
Encarta dictionary
Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for making calculations or drawing conclusions
Encarta dictionary
2 types of Data2 types of Data
Qualitative Quantitative
Qualitative Quantitative
Statistics in ScienceStatistics in Science
Data can be collected about a population (surveys)
Data can be collected about a process/mechanism (experimentation)
Data can be collected about a population (surveys)
Data can be collected about a process/mechanism (experimentation)
Qualitative DataQualitative Data
Information that relates to characteristics or description (observable qualities)
Information is often grouped by descriptive category
Examples Species of plant Type of insect Shades of color Rank of flavor in taste testing
Remember: qualitative data can be “scored” and evaluated numerically
Information that relates to characteristics or description (observable qualities)
Information is often grouped by descriptive category
Examples Species of plant Type of insect Shades of color Rank of flavor in taste testing
Remember: qualitative data can be “scored” and evaluated numerically
Qualitative data, manipulated numerically
Qualitative data, manipulated numerically
Survey results, teens and need for environmental action
Data presented in proportion or % form:
Survey results, teens and need for environmental action
Data presented in proportion or % form:
Quantitative dataQuantitative data
Quantitative – measured using a naturally occurring numerical scale
ExamplesChemical concentrationTemperatureLengthWeight…etc.
Quantitative – measured using a naturally occurring numerical scale
ExamplesChemical concentrationTemperatureLengthWeight…etc.
Quantification Quantification Measurements are often displayed
graphically
Measurements are often displayed graphically
Quantitation = MeasurementQuantitation = Measurement
In data collection for Biology, data must be measured carefully, using laboratory equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc) The limits of the equipment used add
some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?
For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement.
In data collection for Biology, data must be measured carefully, using laboratory equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc) The limits of the equipment used add
some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?
For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement.
Finding the level of uncertainty
Finding the level of uncertainty
As a “rule-of-thumb”, if not specified, use ± 1/2 of the smallest measurement unit (e.g., metric ruler is lined to 1mm, so the limit of uncertainty of the ruler is ± 0.5 mm.)
If the room temperature is read as 25ºC, with a thermometer that is scored at 1-degree intervals, what is the range of possible temperatures for the room? Answer: 25 ± 0.5 ºC
If you read 15oC, it may be between 14.5 and 15.5 ºC
As a “rule-of-thumb”, if not specified, use ± 1/2 of the smallest measurement unit (e.g., metric ruler is lined to 1mm, so the limit of uncertainty of the ruler is ± 0.5 mm.)
If the room temperature is read as 25ºC, with a thermometer that is scored at 1-degree intervals, what is the range of possible temperatures for the room? Answer: 25 ± 0.5 ºC
If you read 15oC, it may be between 14.5 and 15.5 ºC
Definition of StatisticsDefinition of Statistics
Branch of mathematics which allows us to characterize large populations of data by randomly sampling small portions of data from the whole.
Samples come from habitats, communities, biological populations, or experimental investigations, and enable us to draw conclusions about the larger population.
Statistics measure the differences and relationships between sets of data
Nothing is 100% certain in science!
Branch of mathematics which allows us to characterize large populations of data by randomly sampling small portions of data from the whole.
Samples come from habitats, communities, biological populations, or experimental investigations, and enable us to draw conclusions about the larger population.
Statistics measure the differences and relationships between sets of data
Nothing is 100% certain in science!
RandomizationRandomization
Valid conclusions about populations can only be reached when samples are drawn randomly.
Each member of the population must have an equal and independent chance of being sampled.
How might you ensure that populations are randomly sampled?
Valid conclusions about populations can only be reached when samples are drawn randomly.
Each member of the population must have an equal and independent chance of being sampled.
How might you ensure that populations are randomly sampled?
Sample SizeSample Size
The greater the number of samples drawn from a population, the more representative the sample is of that population.
Replication refers to repeatedly measuring a treatment in an experiment to account for variation.
The greater the number of samples drawn from a population, the more representative the sample is of that population.
Replication refers to repeatedly measuring a treatment in an experiment to account for variation.
Factor: Amount of water per day
Treatments: 0.1L, 0.5L, 1.0L
Number of replicates: 3 per treatment
1 2 3
1 2 3
1 2 3
MeanMean
An average of data points
Central tendency of the data
Find the mean of the given data³:
Answer: 12999.4
An average of data points
Central tendency of the data
Find the mean of the given data³:
Answer: 12999.4
Country # of reported HIV
cases
Argentina 27517
Bahamas 4548
Canada 19468
Dominican Republic
7167
Ecuador 6297
RangeRange A measure of the
spread of data Difference between the
largest and the smallest observed values
Find the range of the given data:
Answer: 22969 If one data point were
unusually large or unusually small, it would have a great effect on the range. Such points are called outliers.
A measure of the spread of data
Difference between the largest and the smallest observed values
Find the range of the given data:
Answer: 22969 If one data point were
unusually large or unusually small, it would have a great effect on the range. Such points are called outliers.
Country # of reported HIV
cases
Argentina 27517
Bahamas 4548
Canada 19468
Dominican Republic
7167
Ecuador 6297
Looking at DataLooking at Data
How accurate is the data? (How close are the data to the “real” results?) This is also known as BIAS
How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential.
How accurate is the data? (How close are the data to the “real” results?) This is also known as BIAS
How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential.
(=Replication!)
the mean.
Comparing AveragesComparing Averages
Now plot means together on a graph to visualize the relationship between the two groups.
Now plot means together on a graph to visualize the relationship between the two groups.
The size of our error bars depends on how spread out the data is around the mean
Drawing error barsDrawing error bars
The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average as the endpoints of the data bar
The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average as the endpoints of the data bar
Average value
Value farthest from average
Calculated distance
What do error bars suggest?
What do error bars suggest?
If the bars show extensive overlap, it is likely that there is not a significant difference between those values
If the bars show extensive overlap, it is likely that there is not a significant difference between those values
Error barsError bars
Graphical representation of the variability of data
Can be used to show either the range of data or the standard deviation on a graph
Graphical representation of the variability of data
Can be used to show either the range of data or the standard deviation on a graph
Standard deviationStandard deviation
A measure of how the individual observations of a data set are dispersed or spread out around the mean.
Determined by a mathematical formula which is programmed into your calculator.
In a normal distribution, about 68% of all values lie within ±1 standard deviation of the mean. This rises to about 95% for ±2 standard deviations from the mean.
A measure of how the individual observations of a data set are dispersed or spread out around the mean.
Determined by a mathematical formula which is programmed into your calculator.
In a normal distribution, about 68% of all values lie within ±1 standard deviation of the mean. This rises to about 95% for ±2 standard deviations from the mean.
How is Standard Deviation calculated?
How is Standard Deviation calculated?
With this formula!With this formula!
How to calculate SDHow to calculate SD
TI-86 http://www.saintmarys.edu/~cpeltier/calcforstat/StatTI-86.html
TI-83 and 84 http://www.saintmarys.edu/~cpeltier/calcforstat/StatTI-83.html
In Microsoft Excel, type the following code into the cell where you want the Standard Deviation result, using the "unbiased," or "n-1" method: =STDEV(A1:A30) (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.)
TI-86 http://www.saintmarys.edu/~cpeltier/calcforstat/StatTI-86.html
TI-83 and 84 http://www.saintmarys.edu/~cpeltier/calcforstat/StatTI-83.html
In Microsoft Excel, type the following code into the cell where you want the Standard Deviation result, using the "unbiased," or "n-1" method: =STDEV(A1:A30) (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.)
Comparing the means and standard deviation between
two or more samples
Comparing the means and standard deviation between
two or more samplesHeight of bean plants in the sunlight in
centimetres ±0.1 cmHeight of bean plants in the shade in
centimetres ±0.1 cm
124 131
120 60
153 160
98 212
123 117
142 65
156 155
128 160
139 145
117 95
Total 1300 Total 1300
Mean: 1300/10 = 130.0 cm
AnswersAnswers
SD for sunlight data: 17.68 cm SD for shade data: 47.02 cm
Wide variation makes us question experimental design
Means alone are not sufficient in determining whether two groups differ statistically from one another.
SD for sunlight data: 17.68 cm SD for shade data: 47.02 cm
Wide variation makes us question experimental design
Means alone are not sufficient in determining whether two groups differ statistically from one another.
A typical standard distribution curve
A typical standard distribution curve
According to this curve:According to this curve:
One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for approximately 68 percent of the data in this group.
Two standard deviations away from the mean (the red and green areas) account for roughly 95 percent of the data.
One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for approximately 68 percent of the data in this group.
Two standard deviations away from the mean (the red and green areas) account for roughly 95 percent of the data.
Three Standard Deviations?
Three Standard Deviations?
three standard deviations (the red, green and blue areas) account for about 99 percent of the data
three standard deviations (the red, green and blue areas) account for about 99 percent of the data
-3sd -2sd +/-1sd 2sd +3sd
Significant difference between two data sets using
the t-test
Significant difference between two data sets using
the t-testT-test compares two sets of data to
see if chance alone could make a difference
Scientists like to be at least 95% certain of their findings before drawing conclusions
Mean, SD, and sample size are used to calculate the value of t
Degrees of freedom = sum of sample sizes of each of the two groups minus 2
T-test compares two sets of data to see if chance alone could make a difference
Scientists like to be at least 95% certain of their findings before drawing conclusions
Mean, SD, and sample size are used to calculate the value of t
Degrees of freedom = sum of sample sizes of each of the two groups minus 2
T-test calculationT-test calculation
For all data values: http://www.graphpad.com/quickcalcs/ttest1.cfm
For means: http://www.dimensionresearch.com/resources/calculators/ttest.html
For all data values: http://www.graphpad.com/quickcalcs/ttest1.cfm
For means: http://www.dimensionresearch.com/resources/calculators/ttest.html
Worked exampleWorked example
Compare two groups of barnacles living on a rocky shore. Measure the width of their shells to see if a significant size difference is found depending on how close they live to the water. One group lives between 0 and 10 metres from the water level. The second group lives between 10 and 20 metres above the water level.
Compare two groups of barnacles living on a rocky shore. Measure the width of their shells to see if a significant size difference is found depending on how close they live to the water. One group lives between 0 and 10 metres from the water level. The second group lives between 10 and 20 metres above the water level.
Measurement was taken of the width of the shells in millimetres. 15 shells were measured from each group. The mean of the group closer to the water indicates that living closer to the water causes the barnacles to have a larger shell. If the value of t is 2.25, is that a significant difference?
Measurement was taken of the width of the shells in millimetres. 15 shells were measured from each group. The mean of the group closer to the water indicates that living closer to the water causes the barnacles to have a larger shell. If the value of t is 2.25, is that a significant difference?
Steps to determining significant difference when
given value of t
Steps to determining significant difference when
given value of t Determine degree of freedom (# in
each set minus 2) Ex. 15 + 15 – 2 = 28
Use given value of t Ex. 2.25
Use table of t values to determine probability (p) of chance Ex. 0.05 or 5%
The confidence level is 95% Ex. We are 95% confident that the
difference between barnacles is significant. Barnacles living nearer the water have a significantly larger shell than those living 10 metres or more away from the water.
Determine degree of freedom (# in each set minus 2) Ex. 15 + 15 – 2 = 28
Use given value of t Ex. 2.25
Use table of t values to determine probability (p) of chance Ex. 0.05 or 5%
The confidence level is 95% Ex. We are 95% confident that the
difference between barnacles is significant. Barnacles living nearer the water have a significantly larger shell than those living 10 metres or more away from the water.
T table T table One-tailed t-test– if your
hypothesis is that one mean is either larger or smaller than the other
Two-tailed t-test – if your hypothesis is that the two means are not equal (not specifying larger or smaller)
One-tailed t-test– if your hypothesis is that one mean is either larger or smaller than the other
Two-tailed t-test – if your hypothesis is that the two means are not equal (not specifying larger or smaller)
Web-based t-testWeb-based t-test
http://graphpad.com/quickcalcs/ttest1.cfm
http://graphpad.com/quickcalcs/ttest1.cfm
Correlation does not mean causation
Correlation does not mean causation
Experiments provide a test which shows cause
Observations without an experiment can only show a correlation
Experiments provide a test which shows cause
Observations without an experiment can only show a correlation
Correlation testCorrelation test
Correlation signified by value of r+1 (completely positive correlation)0 (no correlation)-1 (completely negative correlation)http://www.socscistatistics.com/tests/
pearson/
Note that r describes linear Note that r describes linear relationshipsrelationships
Correlation signified by value of r+1 (completely positive correlation)0 (no correlation)-1 (completely negative correlation)http://www.socscistatistics.com/tests/
pearson/
Note that r describes linear Note that r describes linear relationshipsrelationships
Correlation or causation?Correlation or causation?
1. Cars with low gas mileage per gallon of fuel cause global warming.
2. Drinking red wine protects against heart disease.
3. Tanning beds can cause skin cancer.4. UV rays increase the risk of
cataracts.5. Vitamin C cures the common cold.
1. Cars with low gas mileage per gallon of fuel cause global warming.
2. Drinking red wine protects against heart disease.
3. Tanning beds can cause skin cancer.4. UV rays increase the risk of
cataracts.5. Vitamin C cures the common cold.
ResourcesResources
¹http://www.globalissues.org/TradeRelated/Facts.asp#src1
²http://www.globalissues.org/TradeRelated/Consumption.asp
³http://www.who.int/globalatlas/includeFiles/generalIncludeFiles/listInstances.asp
Stephe Taylor Bandung international school
¹http://www.globalissues.org/TradeRelated/Facts.asp#src1
²http://www.globalissues.org/TradeRelated/Consumption.asp
³http://www.who.int/globalatlas/includeFiles/generalIncludeFiles/listInstances.asp
Stephe Taylor Bandung international school