statistical analysis of left-censored geochemical data
DESCRIPTION
Geochemical datasets frequently contain left-censored data, i.e., the actual concentration falls in the range between 0 and the detection limit (DL). These data are referred to as nondetects (NDs). An ND does not necessarily mean the analyte was not present but, if it was present, it was at a concentration below the DL. In addition to NDs, contract labs often report estimated values (often flagged with a “J”) which lie between the DL and the reporting limit (RL). The RL is the level at or above which the lab will state the result is quantitative. A common approach to statistically analyzing left-censored data is to use substitution (e.g., ½DL). Although still a common practice, substitution can introduce bias to statistical analyses. Fortunately, there are a number of statistical techniques specifically designed to handle left-censored data that do not compromise the results of statistical analyses by using substitution. All of these techniques work with NDs and some work with estimated data. There are a number of techniques for calculating summary statistics for left-censored data including nonparametric Kaplan-Meier survival statistics, regression on order statistics (ROS), and the Turnbull interval-censored method. As the name implies, the Turnbull method works with interval censored data (i.e., quantitative data ≥RL, DL-RL [estimated], and 0-DL). In the latter two cases, an interval is used, i.e., the true value lies somewhere within the interval but picking a single value such as ½DL is not required. Interval-censored data can also be used on multivariate ordination techniques such as nonmetric multidimensional scaling (NMDS) and the interval-censored score test – an analog of the generalized Wilcoxon test. Kendall’s tau (τ) is a nonparametric correlation analysis that can be applied to left-censored data. For this test, the estimated (J-flagged) values are used. Kendall’s τ is analogous to the familiar parametric Pearson’s r and, like Pearson’s r, the test for Kendall’s τ also provides a measure of the correlation significance. The case study for this presentation will include the geochemical data and statistical results from the Hawaiʻi Ordnance Reef Follow-Up investigation of the U.S. Army’s Remotely Operated Underwater Munitions Recovery System.TRANSCRIPT
Statistical Analysis ofLeft-Censored Geochemical Data
Michael S. Tomlinson & Eric H. De Carlo
Case Study – Ordnance Reef, Oʻahu, Hawaiʻi
Oʻahu
Hawaiʻi
Maui
Kauaʻi
Molokaʻi
LānaʻiKahoʻolawe
Niʻihau
Ordnance Reef Pre- & Post-ROUMRS Investigations Diamond Head
HonoluluBarbersPoint
Kaʻena Point
Kahuku Point
MakapuʻuPoint
Kailua
Kāneʻohe
Māmala Bay
Waiʻanae
What is the problem?
Disposed Military
Munitionsor DMM
(conventional)
How extensive is the problem?
What is theU.S. Army doing about it?
ROUMRS–Remotely Operated Underwater Munitions Recovery System
Did DMM recovery improve conditions and how was this determined?
• Sediments & biota were sampled and analyzed for energetics & elements in 2009 (Pre-ROUMRS)
• Sediments & biota were again sampled and analyzed for energetics & elements in 2011-2013 (Post-ROUMRS)
• Statistical analyses were conducted to characterize & compare pre- & post-ROUMRS data and identify possible analyte sources
This is what we are talking about today
Lab sends data – now what?Note: It is highly unlikely a contract lab
would send data in this format
No Information!
Nondetects (NDs) are real data!(the partial table below is a better format for geochemical data)
The “U” data qualifier inserted by data validator is redundant and unnecessarywith “ND” and ND provides NO information without the detection limit (DL)
So what do you do with nondetects (NDs)
Ignore
0
½DL
DL
RL
Read countless articles on statistics or…
buy this book which has an excellent compilation of these methods and an accompanying website:www.practicalstats.com
Format your data for these methods
• There are several methods but we will talk about two:– Interval Censored• 0 – DL, DL – RL, & quantitative result(i.e., ≥ RL)
– Indicator Variable• < DL = 1• ≥ DL = 0
Don’t worry – examples on next slide
Data Input Formats
(2 examples)
IC
IV
Summary Statistics
•No NDs
•< 50% NDs
•< 50% NDs
•≥ 50% & < 80% NDs
•≥ 80% NDs
“Standard” statistics
Kaplan-Meier (K-M) statistics (IV) or
Turnbull interval-censored method (IC)
Regression on order statistics (ROS, IV)
Maximum and # & %NDs
Summary Statistics Table (partial)
Statistical method used
Censored boxplots-visualizing data distribution & comparing data
No peeking below red line!
CENSORED
Censored boxplots use variation of the indicator variable format
Analog of nonparametric Wilcoxon test (different data format)
Possible sources of analytes? Try nonmetric multidimensional scalingAnd, notice how terrestrial elements cluster with control samples
Notice how DMM analytes cluster with DMM samples
How strong is the relationship between the various post-ROUMRS analytes?
Correlation matrix (partial) using nonparametric Kendall’s τ; bold green = sig. + correlation & bold red = sig. - corr. at α = 0.05
Conclusions• There are a number of statistical routines that can work with
left-censored data• Substitution (e.g., ½DL) is neither necessary nor
recommended• Even with left-censored data you can:– Calculate summary statistics– Visualize data distributions with boxplots– Compare datasets– Use exploratory methods to look for patterns– Calculate the strength of correlations
• There were some significant changes but they could not be attributed to ROUMRS
What’s next?Hawaii Undersea Military Munitions Assessment
• South Oʻahu - chemical munitions (16,000 100-lb mustard bombs) dumped in >500-m deep water
• Arsenic containing chemical agent Lewisite dumped in deeper water west of Oʻahu
• Biological effects using multivariate statistics• Geostatistics to determine possible sources of arsenic