data analysis 101. overview basic statistics basic statistics reporting variability and error...

84
Data Analysis 101

Upload: maurice-price

Post on 28-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Data Analysis 101

Page 2: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Overview

Basic Statistics

Reporting variability and error

Summarizing Data

Comparison to Standards

Key questions to ask of data

Summarizing QA/QC Standards

Page 3: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Dissolved Oxygen data (mg/L)

Site 23/4/2012 16/5/2012 15/6/2012 14/7/2012 15/8/2012 9/15/20121 10.2 9.8 9.2 10.0 9.3 10.32 9.6 8.2 10.3 10.1 9.9 9.83 8.3 7.1 8.2 6.3 6.3 5.64 7.5 7.2 7.3 6.0 6.0 2.15 8.0 7.2 6.1 6.3 3.4 6.26 6.1 7.6 6.9 8.6 8.4 8.9

Site 14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013 21/9/20131 9.2 10.2 8.3 7.8 10.3 10.12 9.5 6.9 9.3 10.3 10.6 9.23 8.2 6.3 8.2 8.3 7.2 6.24 7.2 7.3 6.1 7.0 2.3 6.55 6.3 8.0 8.6 3.0 6.7 7.66 9.2 10.3 7.6 9.3 7.3 10.3

Page 4: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Average (Arithmetic Mean)

Site Average

1 9.6

2 9.5

3 7.2

4 6.0

5 6.5

6 8.4

Page 5: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Average (Arithmetic Mean)

Site Average

1 9.6

2 9.5

3 7.2

4 6.0

5 6.5

6 8.4

Very high and low numbers can distort results

Is the Site 4 value of 6.0 mg/L representative of the data set?

Page 6: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Average (Arithmetic Mean)

Site Average

1 9.6

2 9.5

3 7.2

4 6.0

5 6.5

6 8.4

Very high and low numbers can distort results

Is the Site 4 value of 6.0 mg/L representative of the data set?

Site 23/4/2012 16/5/2012 15/6/2012 14/7/2012 15/8/2012 9/15/20124 7.5 7.2 7.3 6.0 6.0 2.1

Site 14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013 21/9/20134 7.2 7.3 6.1 7.0 2.3 6.5

Page 7: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Median

Central value in a set of values, ranked from lowest to highest.

2, 5, 6, 6, 10, 11, 12, 13, 120

Page 8: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Median

Central value in a set of values, ranked from lowest to highest.

2, 5, 6, 6, 10, 11, 12, 13, 120

Median = 10

Average = 20.5

Page 9: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Median

Central value in a set of values, ranked from lowest to highest.

2, 5, 6, 6, 10, 11, 12, 13, 120

Median = 10

Average = 20.5

Page 10: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site Average Median1 9.6 9.92 9.5 9.73 7.2 7.24 6.0 6.85 6.5 6.56 8.4 8.5

Site 23/4/2012 16/5/2012 15/6/2012 14/7/2012 15/8/2012 9/15/20124 7.5 7.2 7.3 6.0 6.0 2.1

Site 14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013 21/9/20134 7.2 7.3 6.1 7.0 2.3 6.5

Site 4 median value is more representative of data set than average

Page 11: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Range Maximum & Minimum

Range is the difference between the maximum and minimum values in data set.

The larger the range, the greater the variability

Maximum and Minimum values are also important DO standards expressed as minimum

concentration to needed to support fish Bacteria levels expresses as maximum levels

that pose an acceptable risk to public health

Page 12: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L)

Site Min Max Range

1 7.8 10.3 2.5

2 6.9 10.6 3.7

3 5.6 8.3 2.7

4 2.1 7.5 5.4

5 3.0 8.6 5.6

6 6.1 10.3 4.2

Page 13: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L)

Site Min Max Range

1 7.8 10.3 2.5

2 6.9 10.6 3.7

3 5.6 8.3 2.7

4 2.1 7.5 5.4

5 3.0 8.6 5.6

6 6.1 10.3 4.2

Sites 4 and 5 have the greatest range

Page 14: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Quartiles and Interquartile Range

Quartiles: 3 values below which lie 25%, 50% & 75% of the values in a set of numbers, respectively

Median = 50th quartile

Half of your data values occur between the 25th and 75th quartiles

Difference between the 25th and 75th quartiles is the IQ Range

Page 15: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L) - Site 3

8.3

7.1

8.2

6.3

6.3

5.6

8.2

6.3

8.2

8.3

7.2

6.2

Page 16: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L) - Site 3 DO (mg/L) - Site 3

8.3 8.3

7.1 8.3

8.2 8.2

6.3 8.2

6.3 8.2

5.6 7.2

8.2 7.1

6.3 6.3

8.2 6.3

8.3 6.3

7.2 6.2

6.2 5.6

Page 17: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L) - Site 3 DO (mg/L) - Site 3

8.3 8.3

7.1 8.3

8.2 8.2

6.3 8.2

6.3 8.2

5.6 7.2

8.2 7.1

6.3 6.3

8.2 6.3

8.3 6.3

7.2 6.2

6.2 5.6

75th quartile

Page 18: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L) - Site 3 DO (mg/L) - Site 3

8.3 8.3

7.1 8.3

8.2 8.2

6.3 8.2

6.3 8.2

5.6 7.2

8.2 7.1

6.3 6.3

8.2 6.3

8.3 6.3

7.2 6.2

6.2 5.6

50th quartile (median)

75th quartile

Page 19: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L) - Site 3 DO (mg/L) - Site 3

8.3 8.3

7.1 8.3

8.2 8.2

6.3 8.2

6.3 8.2

5.6 7.2

8.2 7.1

6.3 6.3

8.2 6.3

8.3 6.3

7.2 6.2

6.2 5.6

50th quartile (median)

75th quartile

25th quartile

Page 20: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO (mg/L) - Site 3 DO (mg/L) - Site 3

8.3 8.3

7.1 8.3

8.2 8.2

6.3 8.2

6.3 8.2

5.6 7.2

8.2 7.1

6.3 6.3

8.2 6.3

8.3 6.3

7.2 6.2

6.2 5.6

50th quartile (median)

75th quartile

25th quartile

Page 21: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Quartiles and Interquartile Range

Site 25th Median 75th IQ Range1 9.20 9.90 10.20 1.002 9.28 9.70 10.15 0.883 6.30 7.15 8.20 1.904 6.00 6.75 7.23 1.235 6.18 6.50 7.70 1.536 7.53 8.50 9.23 1.70

Page 22: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Quartiles and Interquartile Range

Site 25th Median 75th IQ Range1 9.20 9.90 10.20 1.002 9.28 9.70 10.15 0.883 6.30 7.15 8.20 1.904 6.00 6.75 7.23 1.235 6.18 6.50 7.70 1.536 7.53 8.50 9.23 1.70

Which sample site has the greatest variability in data?

Which has the least?

Page 23: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Quartiles and Interquartile Range

Site 25th Median 75th IQ Range1 9.20 9.90 10.20 1.002 9.28 9.70 10.15 0.883 6.30 7.15 8.20 1.904 6.00 6.75 7.23 1.235 6.18 6.50 7.70 1.536 7.53 8.50 9.23 1.70

Which sample site has the greatest variability in data?

Which has the least?

Page 24: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Geometric Mean

Like median, the geometric mean reduces the influence of very high and very low numbers in data set.

GeoMean = √2 x 8 = 4

GeoMean = √2 x 4 x 8 = 4

Use when data covers several orders of magnitude (Guideline: largest value must be at least 3x smallest)

Spreadsheets: replace “0” values with “1”

2

3

Page 25: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

E.coli (MPN)Site 23/4/2012 16/5/2012 15/6/2012 14/7/2012 15/8/2012

1 2 280 100 38 21

2 15 1420 21 39 74

3 100 2250 12 34 50

4 80 1000 100 57 146

5 30 260 100 100 630

6 10 1460 7 43 30

Site 14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013

1 170 340 6 20 162

2 119 490 12 120 50

3 273 190 17 20 60

4 202 630 63 160 12

5 76 770 163 310 468

6 19 40 150 4 16

Page 26: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

E. coli summary

Site Geomean Average1 47 1142 75 2363 74 3014 126 2455 192 2916 32 178

Page 27: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

E. coli summary

Site Geomean Average1 47 1142 75 2363 74 3014 126 2455 192 2916 32 178

In every case, geomean is lower than average

Especially true for Site 6, where geomean is six times lower than mean

Page 28: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site 23/4/12 16/5/12 15/6/12 14/7/12 15/8/121 2 280 100 38 212 15 1420 21 39 743 100 2250 12 34 504 80 1000 100 57 1465 30 260 100 100 6306 10 1460 7 43 30

Site 14/4/13 16/5/13 14/6/13 16/7013 15/8/131 170 340 6 20 1622 119 490 12 120 503 273 190 17 20 604 202 630 63 160 125 76 770 163 310 4686 19 40 150 4 16

Page 29: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site 23/4/12 16/5/12 15/6/12 14/7/12 15/8/121 2 280 100 38 212 15 1420 21 39 743 100 2250 12 34 504 80 1000 100 57 1465 30 260 100 100 6306 10 1460 7 43 30

Site 14/4/13 16/5/13 14/6/13 16/7013 15/8/131 170 340 6 20 1622 119 490 12 120 503 273 190 17 20 604 202 630 63 160 125 76 770 163 310 4686 19 40 150 4 16

Site Geomean Average

1 47 114

2 75 236

3 74 301

4 126 245

5 192 291

6 32 178

Page 30: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site 23/4/12 16/5/12 15/6/12 14/7/12 15/8/121 2 280 100 38 212 15 1420 21 39 743 100 2250 12 34 504 80 1000 100 57 1465 30 260 100 100 6306 10 1460 7 43 30

Site 14/4/13 16/5/13 14/6/13 16/7013 15/8/131 170 340 6 20 1622 119 490 12 120 503 273 190 17 20 604 202 630 63 160 125 76 770 163 310 4686 19 40 150 4 16

Site Geomean Average

1 47 114

2 75 236

3 74 301

4 126 245

5 192 291

6 32 178

Page 31: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Sites 3, 4 & 6 – single high result skews up average

Site 3 had highest average; Site 5 had highest geomean

Different analysis = different result!

Site 23/4/12 16/5/12 15/6/12 14/7/12 15/8/121 2 280 100 38 212 15 1420 21 39 743 100 2250 12 34 504 80 1000 100 57 1465 30 260 100 100 6306 10 1460 7 43 30

Site 14/4/13 16/5/13 14/6/13 16/7013 15/8/131 170 340 6 20 1622 119 490 12 120 503 273 190 17 20 604 202 630 63 160 125 76 770 163 310 4686 19 40 150 4 16

Site Geomean Average

1 47 114

2 75 236

3 74 301

4 126 245

5 192 291

6 32 178

Page 32: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Suggested Statistical Summaries

Tend to be useful for comparisons between sites, or between months, seasons, or years for the same site

Presents a “representative” or “typical” value and information on how the data is spread

Page 33: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Suggested Statistical Summaries

Indicator Summary

Temperature (water or air) • Seasonal average• Seasonal median• Maximum• Range• Quartiles

Dissolved Oxygen (mg/L) • Seasonal median• Minimum• Quartiles

Dissolved Oxygen (% saturation)

• Seasonal average*• Seasonal median• Quartiles

Water clarity • Seasonal average• Seasonal median• Maximum and Minimum• Range• Quartiles

Page 34: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Suggested Statistical Summaries

Indicator Summary

Bacteria (E. coli) • Geometric mean• Quartiles

Turbidity • Median• Quartiles

Nutrients (e.g. NO3/ PO4) • Median• Quartiles

Specific Conductivity or Salinity • Median• Quartiles

pH • Median or average*• Quartiles• Minimum

Page 35: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Statistical Summaries

Factors to bear in mind:

Temp and DO – use seasonal medians and quartiles, since these parameters vary naturally with seasons

In general, use median instead of average

You should at least 5 data points to calculate averages, geometric mean, medians and quartiles.

Page 36: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

A good table has….

Readable, logical data placement

Clear column and row headings

A title at the top

Reporting units includedSites Median

1 0.02

2 0.02

3 0.12

4 0.12

5 0.11

6 0.04

Smith River Median Orthophosphate Results for 2013 (mg/L)

Page 37: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

A good graph has…..

A clear title

Simple clear labels on axes

A scale that reveals trends

A legend that explains the elements on graph

Clearly shown reporting units

A story that is apparent from the graph

Information that allows the reader to get the point, e.g. levels of concern

The minimum number of elements to tell the story – avoid clutter

Page 38: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

A summary of bacteria levels collected from the Smith River, Annapolis County, Nova Scotia by Tim Timmins

with his friend Anna, for the Golden Valley Water Coalition, a not-for-profit group committed to the well-

being of Golden Valley

Sample Site

Bacte

ria

Page 39: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.

coli (

MP

N)

Page 40: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.

coli (

MP

N)

Threshold of concern

Page 41: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.

coli (

MP

N)

Threshold of concern

Upstream

Downstream

Page 42: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.c

oli (

MP

N)

Page 43: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.c

oli (

MP

N)

Graph implies a connection between each point on line & trend up or down between sites. This may not be appropriate in all cases

Page 44: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.c

oli (

MP

N)

Sample sites 2 km apart, except sites 5 & 6, which are 20 km apart.

Graph implies a connection between each point on line & trend up or down between sites. This may not be appropriate in all cases

Page 45: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

50

100

150

200

250

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.

coli (

MP

N)

Page 46: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

1 2 3 4 5 60

500

1000

1500

2000

Geometric mean of E. coli bacteria values, Smith River 2012

Sample Site

E.

coli (

MP

N)

Page 47: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013 21/9/20130

20

40

60

80

100

Dissolved Oxygen (% Saturation) at Site 3: 2013

Dis

solv

ed

Oxyg

en

(%

sat)

Page 48: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013 21/9/201370

75

80

85

90

95

100

Dissolved Oxygen (% Saturation) at Site 3: 2013

Dis

solv

ed

Oxyg

en

(%

sat)

14/4/2013 16/5/2013 14/6/2013 16/7/2013 15/8/2013 21/9/20130

20

40

60

80

100

Dissolved Oxygen (% Saturation) at Site 3: 2013

Dis

solv

ed

Oxyg

en

(%

sat)

Page 49: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

0 1 2 3 42.0

4.0

6.0

8.0

10.0

12.0

Summary of 2012 DO Values for selected Smith River sites

AverageMedian

Sample Site

Dis

solv

ed O

xygen (

mg/L

)

Page 50: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

0 1 2 3 42.0

4.0

6.0

8.0

10.0

12.0

Summary of 2012 DO Values for selected Smith River sites

AverageMedian

Sample Site

Dis

solv

ed O

xygen (

mg/L

)

0 1 2 3 42.0

4.0

6.0

8.0

10.0

12.0

Sample Site

Dis

solv

ed O

xygen (

mg/L

)

Page 51: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site 1 Site 2 Site 3 Site 40.00

2.00

4.00

6.00

8.00

10.00

12.00

Summary of 2012 Dissolved Oxygen Values for selected Smith River sites

Dis

solv

ed O

xygen (

mg/L

)

Page 52: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site 1 Site 2 Site 3 Site 40.00

2.00

4.00

6.00

8.00

10.00

12.00

Summary of 2012 Dissolved Oxygen Values for selected Smith River sites

Mean

Dis

solv

ed O

xygen (

mg/L

)

Page 53: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Creating Box and Whisker Plots

Proprietary 3rd party graphing software (e.g. Grapher)

Some Statistics packages

Not standard with MS Excel

Excel instructions at:

http://peltiertech.com/WordPress/excel-box-and-whisker-diagrams-box-plots/

Page 54: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Reporting Variability

Sample Standard Deviation

SD = √((x – mean)2) / (n – 1)

SD = sample standard deviation

X = individual sample value

Mean = arithmetic mean of all values

N = number of sample values

A measure of the amount of variability with a data set.

Page 55: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Reporting Variability

Sample Standard Deviation

SD = √((x – mean)2) / (n – 1)

SD = sample standard deviation

X = individual sample value

Mean = arithmetic mean of all values

N = number of sample values

A measure of the amount of variability with a data set.

Page 56: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Estimating precision

Standard Error

SE = SD / √n

SE = standard error

SD = sample standard deviation

N = sample size

Page 57: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Estimating precision

Standard Error

SE = SD / √n

SE = standard error

SD = sample standard deviation

N = sample size

Quantifies the certainty with which the mean computed from a random sample estimates the true mean of the population from which the sample was drawn.

Page 58: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Estimating precision

Co-efficient of Variation

CV =( SD / sample mean ) x 100

CV does not depend on magnitude of values and units.

This allows comparison of different studies and different sampling designs

Page 59: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

0 1 2 3 42.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

Summary of 2012 Mean DO Values for selected Smith River sites

Sample Site

Dis

solv

ed O

xygen (

mg/L

)

Page 60: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

0 1 2 3 42.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

Summary of 2012 Mean DO Values for selected Smith River sites, with standard error shown

Sample Site

Dis

solv

ed O

xygen (

mg/L

)

Page 61: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

You have your data, but what does it mean?

Do your values show a problem or not?

It helps to have a point of reference.

Page 62: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Variable

Guideline

Units

Water Quality Objective

Notes Reference

DO 6.5 to 9.5 mg/L Freshwater aquatic life

cold water biota

CCME 2002

pH 6.5 to 9.0 Freshwater aquatic life

CCME 2002

Temp. <20<24

°C Stress to salmonidsMortality to salmonids

MacMillan et al 2005

Total P 0.030.030.02 to 0.07

mg/L Protection from eutrophication

OMEEMackie 2004Dodds and Welch 2000

Total N 0.25 to 3.0

mg/L Protection from eutrophication

Dodds and Welch 2000

E. coli <200 cfu/100 ml

Human recreational contact

Geomean of 5 samples taken with 30 days

Health Canada 2012

TSS 25 mg/L Clear flow, short term

Max increase from background

CCME 2002

Page 63: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Site 1 Site 2 Site 3 Site 40.00

2.00

4.00

6.00

8.00

10.00

12.00

Summary of 2012 Dissolved Oxygen Values for selected Smith River sites

MeanLow DO Threshold

Dis

solv

ed O

xygen (

mg/L

)

Page 64: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Other sources of WQ reference values

www.lakes.chebucto.org/lakecomp.html

(reference and historical values for NS lakes)

http://novascotia.ca/nse/surface.water/automatedqualitymonitoringdata.asp

(automated data collection – NS surface water quality monitoring network)

Page 65: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Questions to ask of your data

Dates, 1995

Page 66: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Questions to ask of your data

Which sites consistently did not meet WQO? By how much?

Were there sampling dates on which most or all of the sites did not meet the criteria?

Do levels increase or decrease in a consistent manner up or downstream?

If monitoring a pollution source, are results different above/below?

Does change in an indicator coincide with changes in another? e.g. DO & temperature

Dates, 1995

Page 67: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Human alterations or Natural conditions??

Page 68: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Human alterations or Natural conditions??

Might natural up/downstream changes in river account for results? (benthic invert drift/turbidity)

Does weather influence results? (heavy rain, elevated temp)

Do problem levels coincide with rising flow? (consider dam releases or flow management)

Does presence of specific sources explain results (WWTP, failing septic)

Page 69: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Human alterations or Natural conditions??

con’t

Do changes in an indicator appear to explain changes in another (Low DO/high temp)

Do visual results explain results? (strange pipes, eroding banks, dry weather seeps etc)

If monitoring impact of a pollution source, could multiple point sources be confusing results?

Page 70: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

More questions to keep in the back of your

mindCould flaws in field/lab techniques explain

results? (sample contamination/sampling error)

For episodic discharges, did sampling coincide with discharge?

Where analytical methods sensitive enough to detect levels of concern?

Time of day of sampling (diurnal DO cycling)

Page 71: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Summarizing QA/QC Results

Page 72: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

You need to prove the:

Precision

Accuracy

Representativeness

Comparability

Completeness

of your data and conclusions

Page 73: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing
Page 74: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

DO, pH & Temperature collected here once per year

Page 75: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Is this sampling representative of environmental conditions in this lake?

DO, pH & Temperature collected here once per year

Page 76: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Volunteer DO (mg/L) results from training

day (same time and

place)

Tom 8.9

Jon 6.8

Jill 9.0

Geoff 8.8

Page 77: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Volunteer DO (mg/L) results from training

day (same time and

place)

Tom 8.9

Jon 6.8

Jill 9.0

Geoff 8.8Are volunteer results comparable?

Page 78: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Volunteer DO (mg/L) results from training

day (same time and

place)

Tom 8.9

Jon 6.8

Jill 9.0

Geoff 8.8Are results comparable between volunteers, at

different times and at different locations?

Page 79: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Monitoring Plan

- Sample DO, pH, Spec. Cond. & Turbidity within 12 hours of >15mm precipitation events, on Sackville River between April and October

Page 80: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Monitoring Plan

- Sample DO, pH, Spec. Cond. & Turbidity within 12 hours of >15mm precipitation events, on Sackville River between April and October

Monitoring Results

- Samples collected at 4 of 9 precipitation events

Are results complete?

Page 81: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Collect Replicates to Evaluate PrecisionObservation DO (mg/L)

1 9.8

2 9.9

3 10.1

4 10.1

Mean 9.98

Standard Deviation 0.15

Co-efficient of Variation 1.50

Samples collected by the same individual at same location and time

Set threshold for maximum co-efficient of variation?

Page 82: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Collect Replicates to Evaluate accuracy

Site Date DO (mg/L) DO (mg/L) Difference % Difference

Volunteer QA/QC

10 13/6/2013 8 9 1 11.1

20 22/8/2013 11.3 11.4 0.1 0.9

30 23/8/2013 7.4 8.4 1 11.9

40 15/9/2013 8.8 9.1 0.3 3.3

50 16/9/2013 7.2 9.1 1.9 20.9

60 1/10/2013 10.4 8.4 -2 -23.8

Single sample split and tested by volunteer and program coordinator using same method.

Page 83: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Collect Replicates to Evaluate accuracy

Site Date DO (mg/L) DO (mg/L) Difference % Difference

Volunteer QA/QC

10 13/6/2013 8 9 1 11.1

20 22/8/2013 11.3 11.4 0.1 0.9

30 23/8/2013 7.4 8.4 1 11.9

40 15/9/2013 8.8 9.1 0.3 3.3

50 16/9/2013 7.2 9.1 1.9 20.9

60 1/10/2013 10.4 8.4 -2 -23.8

Single sample split and tested by volunteer and program coordinator using same method.Which volunteer(s) need retraining on analysis technique?Set threshold for maximum percent difference?

Page 84: Data Analysis 101. Overview Basic Statistics Basic Statistics Reporting variability and error Reporting variability and error Summarizing Data Summarizing

Collect Replicates to Evaluate accuracy

Site Date DO (mg/L) DO (mg/L) Difference % Difference

Volunteer QA/QC

10 13/6/2013 8 9 1 11.1

20 22/8/2013 11.3 11.4 0.1 0.9

30 23/8/2013 7.4 8.4 1 11.9

40 15/9/2013 8.8 9.1 0.3 3.3

50 16/9/2013 7.2 9.1 1.9 20.9

60 1/10/2013 10.4 8.4 -2 -23.8

Single sample split and tested by volunteer and program coordinator using same method.Which volunteer(s) need retraining on analysis technique?Set threshold for maximum percent difference?