quality assurance & quality control university of new mexico kristin vanderbilt william michener...

62
Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don Edwards

Upload: stuart-cross

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Quality Assurance & Quality Control

University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux

University of South Carolina Don Edwards

Page 2: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Instruction Materials Primary References

Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell Science.

Edwards (2000) Ch. 4 Brunt (2000) Ch. 2 Michener (2000) Ch. 7

Dux, J.P. 1986. Handbook of Quality Assurance for the Analytical Chemistry Laboratory. Van Nostrand Reinhold Company

Mullins, E. 1994. Introduction to Control Charts in the Analytical Laboratory: Tutorial Review. Analyst 119: 369 – 375.

Page 3: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outline:

Define QA/QC QC procedures

Designing data sheets Data entry using validation rules, filters,

lookup tables QA procedures

Graphics and Statistics Outlier detection

Samples Simple linear regression

Page 4: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

QA/QC “mechanisms [that] are designed to prevent

the introduction of errors into a data set, a process known as data contamination”

Brunt 2000

Page 5: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Errors (2 types)

Commission: Incorrect or inaccurate data in a dataset

Can be easy to find Malfunctioning instrumentation

Sensor drift Low batteries Damage Animal mischief

Data entry errors

Omission Difficult or impossible to find Inadequate documentation of data values, sampling

methods, anomalies in field, human errors

Page 6: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Quality Control “mechanisms that are applied in advance,

with a priori knowledge to ‘control’ data quality during the data acquisition process”

Brunt 2000

Page 7: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Quality Assurance “mechanisms [that] can be applied after

the data have been collected, entered in a computer and analyzed to identify errors of omission and commission” graphics statistics

Brunt 2000

Page 8: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

QA/QC Activities Defining and enforcing standards for

formats, codes, measurement units and metadata.

Checking for unusual or unreasonable patterns in data.

Checking for comparability of values between data sets.

Brunt 2000

Page 9: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outline:

Define QA/QC QC procedures

Designing data sheets Data entry using validation rules, filters,

lookup tables QA procedures

Graphics and Statistics Outlier detection

Samples Simple linear regression

Page 10: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Flowering Plant Phenology – Data Entry Form Design

Four sites, each with 3 transects Each species will have

phenological class recorded

Page 11: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Plant Life Stage______________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _______________

What’s wrong with this data sheet?

Data Collection Form Development:

Page 12: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Plant Life Stageardi P/G V B FL FR M S D NParpu P/G V B FL FR M S D NPatca P/G V B FL FR M S D NPbamu P/G V B FL FR M S D NPzigr P/G V B FL FR M S D NP

P/G V B FL FR M S D NPP/G V B FL FR M S D NP

PHENOLOGY DATA SHEETCollectors:_________________________________Date:___________________ Time:_________Location: black butte, deep well, five points, goat draw

Transect: 1 2 3

Notes: _________________________________________

P/G = perennating or germinating M = dispersingV = vegetating S = senescingB = budding D = deadFL = flowering NP = not presentFR = fruiting

Page 13: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outline:

Define QA/QC QC procedures

Designing data sheets Data entry using validation rules, filters,

lookup tables QA procedures

Graphics and Statistics Outlier detection

Samples Simple linear regression

Page 14: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don
Page 15: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Other methods for preventing data contamination

Double-keying of data by independent data entry technicians followed by computer verification for agreement

Randomly check 10% of data to calculate frequency of errors

Use text-to-speech program to read data back Filters for “illegal” data

Computer/database programs Legal range of values Sanity checks

Edwards 2000

Page 16: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Table ofPossibleValues

and Ranges

Raw DataFile

Illegal DataFilter

Report ofProbable

Errors

Flow of Information in Filtering “Illegal” Data

Edwards 2000

Page 17: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Illegal-data filter in SASData Checkum; Set all;

Message=repeat(“ ”,39);If Y1<0 or Y1>1 then do; message=“Y1 is not

on the interval [0,1]”; output; end;If Y3>Y4 then do; message=“Y3 is larger than

Y4”; output; end;…….(add as many checks as desired)If message NE repeat(“ ”, 39);Keep ID message;

Proc Print Data=Checkum;

Page 18: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outline:

Define QA/QC QC procedures

Designing data sheets Data entry using validation rules, filters,

lookup tables QA procedures

Graphics and Statistics Outlier detection

Samples Simple linear regression

Page 19: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Identifying Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER

Page 20: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Identification of Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER

Page 21: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

QA/QC in the Lab: Using Control Charts

Page 22: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Laboratory quality control using statistical process control charts:

Determine whether analytical system is “in control” by examining Mean Variability (range)

Page 23: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

All control charts have three basic components:

a centerline, usually the mathematical average of all the samples plotted.

upper and lower statistical control limits that define the constraints of common cause variations.

performance data plotted over time.

Page 24: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

16

18

20

22

24

26

0 2 4 6 8 10 12 14 16 18 20

Time

Mean

LCL

UCL

X-Bar Control Chart

Page 25: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Constructing an X-Bar control chart

Each point represents a check standard run with each group of 20 samples, for example

UCL = mean + 3*standard deviation

Page 26: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Things to look for in a control chart:

The point of making control charts is to look at variation, seeking patterns or statistically unusual values. Look for: 1 data point falling outside the control limits 6 or more points in a row steadily increasing

or decreasing 8 or more points in a row on one side of the

centerline 14 or more points alternating up and down

Page 27: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Linear trend

0

2

4

6

8

10

12

0 2 4 6 8 10 12

time

Page 28: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outline:

Define QA/QC QC procedures

Designing data sheets Data entry using validation rules, filters,

lookup tables QA procedures

Graphics and Statistics Outlier detection

Samples Simple linear regression

Page 29: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outliers An outlier is “an unusually extreme value

for a variable, given the statistical model in use”

The goal of QA is NOT to eliminate outliers! Rather, we wish to detect unusually extreme values.

Edwards 2000

Page 30: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outlier Detection

“the detection of outliers is an intermediate step in the elimination of [data] contamination” Attempt to determine if contamination is

responsible and, if so, flag the contaminated value.

If not, formally analyse with and without “outlier(s)” and see if results differ.

Edwards 2000

Page 31: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Methods for Detecting Outliers Graphics

Scatter plots Box plots Histograms Normal probability plots

Formal statistical methods Grubbs’ test

Edwards 2000

Page 32: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

X-Y scatter plots of gopher tortoise morphometrics

Michener 2000

Page 33: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Example of exploratory data analysis using SAS Insight®Michener 2000

Page 34: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Box Plot Interpretation

Upper quartileMedian

Lower quartile

Upper adjacent value

Pts. > upper adjacent value

Lower adjacent valuePts. < lower adjacent value

Inter-quartile range

Page 35: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Box Plot Interpretation

Inter-quartile range

IQR = Q(75%) – Q(25%)

Upper adjacent value = largest observation < (Q(75%) + (1.5 X IQR))

Lower adjacent value = smallest observation > (Q(25%) - (1.5 X IQR))

Page 36: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Box Plots Depicting Statistical Distribution of Soil Temperature

Page 37: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Statistical tests for outliers assume that the data are normally distributed….

CHECK THIS ASSUMPTION!

Page 38: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Y

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

a. The Normal Frequency Curve

Y

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

b. Cumulative Normal Curve Areas

Normal density and Cumulative Distribution Functions

Edwards 2000

Page 39: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Quantiles of Standard Normal

Y

-2 -1 0 1 2

78

910

1112

Normal Plot of 30 Observations from a Normal Distribution

Edwards 2000

Page 40: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Quantiles of Standard Normal

Y

-2 -1 0 1 2

01

23

4

a. a right skewed distribution

Quantiles of Standard Normal

Y

-2 -1 0 1 2

0.0

1.0

2.0

3.0

b. a left-truncated Normal distribution

Quantiles of Standard Normal

Y

-2 -1 0 1 2

-50

5

c. a heavy-tailed distribution

Quantiles of Standard Normal

Y

-2 -1 0 1 22

46

810

12

14

16

d. a contaminated Normal distribution

Edwards 2000

Normal Plots from Non-normally Distributed Data

Page 41: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Grubbs’ Text is a Statistical Test to determine if a point is an outlier.

Page 42: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Grubbs’ test for outlier detection:

Tn = (Yn – Ybar)/S

where Yn is the possible outlier,

Ybar is the mean of the sample, and

S is the standard deviation of the sample

Contamination exists if Tn is greater than T.01[n]

Page 43: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Example of Grubbs’ test for outliers: rainfall in acre-feet from seeded clouds (Simpson et al. 1975)

4.1 7.7 17.5 31.432.7 40.6 92.4 115.3118.3 119.0 129.6 198.6200.7 242.5 255.0 274.7274.7 302.8 334.1 430.0489.1 703.4 978.0 1656.01697.8 2745.6T26 = 3.539 > 3.029; Contaminated

Edwards 2000But Grubbs’ test is sensitive to non-normality……

Page 44: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

0 500 1500 2500

05

10

15

20

rainfall

a. rainfall

Quantiles of Standard Normal

rain

fall

-2 -1 0 1 2

0500

1500

2500

b. Normal plot

0.5 1.5 2.5 3.5

02

46

810

log10(rainfall)

c. log10(rainfall)

Quantiles of Standard Normal

log10(r

ain

fall)

-2 -1 0 1 2

0.5

1.5

2.5

3.5

d. Normal plot

Edwards 2000

Checking Assumptions on Rainfall Data

Contaminated

Normallydistributed

Page 45: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Simple Linear Regression:check for model-based……

Outliers Influential (leverage) points

Page 46: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Influential points in simple linear regression:

A “leverage” point is a point with an unusual regressor value that has more weight in determining regression coefficients than the other data values.

Edmonds 2000

Page 47: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

x

y

0 10 20 30 40 50

50

100

150

outlier andleverage pt

leverage pt

outlier

Edwards 2000

Influential Data Points in a Simple Linear Regression

Page 48: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

x

y

0 10 20 30 40 50

50

100

150

outlier andleverage pt

leverage pt

outlier

Edwards 2000

Influential Data Points in a Simple Linear Regression

Page 49: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

x

y

0 10 20 30 40 50

50

100

150

outlier andleverage pt

leverage pt

outlier

Edwards 2000

Influential Data Points in a Simple Linear Regression

Page 50: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

x

y

0 10 20 30 40 50

50

100

150

outlier andleverage pt

leverage pt

outlier

Edwards 2000

Influential Data Points in a Simple Linear Regression

Page 51: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

body weight (kg)

bra

in w

eig

ht (g

)

0 2000 4000 6000

01000

2000

3000

4000

5000

African Elephant

Asian Elephant

Man

Edwards 2000

Brain weight vs. body weight, 63 species of terrestrial mammals

Leverage pts.

“Outliers”

Page 52: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

log10 body weight (kg)

log10 b

rain

weig

ht (g

)

-2 -1 0 1 2 3 4

-10

12

3

Elephants

Man

Chinchilla

mispunched point

Edwards 2000

Logged brain weight vs. logged body weight

“Outliers”

Page 53: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outliers in simple linear regression

0

50

100

150

200

250

0 50 100 150 200 250 300 350 400 450

Beta-Glucosidase

Ph

os

ph

ata

se

Observation 62

Page 54: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Outliers: identify using studentized residuals Contamination may exist if

|ri| > t /2, n-3

= 0.01

Page 55: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Simple linear regression:Outlier identification

n = 86

t/2,83 = 1.98

Page 56: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Simple linear regression:detecting leverage points

hi = (1/n) + (xi – x)2/(n-1)Sx2

A point is a leverage point if hi > 4/n, where n is the number of points used to fit the regression

Page 57: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Regression with leverage point:Soil nitrate vs. soil moisture

Page 58: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Regression without leverage point

Observation 46

Page 59: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Output from SAS:Leverage points

n = 336

hi cutoff value: 4/336=0.012

Page 60: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Process from raw data to verified data to validated data (reviewed by a qualified scientist) is called data maturity, and implies increasing

confidence in the quality of the data through time

Page 61: Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don

Data Quality (Confidenc

e in the Data)

Data Maturity

Raw Data

Verified Data

Validated Data