lester l. yuan 1 jeroen gerritsen 2 james b. stribling 2
DESCRIPTION
Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2 1 Office of Research and Development, US EPA, Washington, DC 2 TetraTech, Inc., Owings Mill, MD. How often are we wrong? A Bayesian assessment of taxonomic identifications for the National Wadeable Streams Assessment. Basic Question. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/1.jpg)
Lester L. Yuan1
Jeroen Gerritsen2
James B. Stribling2
1Office of Research and Development, US EPA, Washington, DC2TetraTech, Inc., Owings Mill, MD
How often are we wrong? A Bayesian assessment of taxonomic identifications
for the National Wadeable Streams Assessment
![Page 2: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/2.jpg)
Basic Question
• How consistent are observations of the presence and absence of different invertebrate taxa between different taxonomists?
Why focus on presence/absence?
Presence/absence information is inherent to richness metrics and predictive models (e.g., RIVPACS).
![Page 3: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/3.jpg)
More refined question
• If a taxonomist tells you that a certain taxon is in a sample, what is the probability that the taxon is truly in the sample?
• We would like to estimate a value for:
P(present | observed)
![Page 4: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/4.jpg)
Data
• National Wadeable Stream Assessment (NWSA) benthic samples were counted by 26 taxonomists in 11 different labs.
• 72 samples were randomly selected and re-identified by a single, QC taxonomist.
![Page 5: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/5.jpg)
Estimating identification error rates
True Positive Rate: proportion of times that a taxon was observed, and was actually present.
False Positive Rate: proportion of times that a taxon was observed, but was actually absent.
![Page 6: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/6.jpg)
Major Assumption
• QC taxonomist identifications are “truth” (for this analysis).
• All identification errors are committed by primary taxonomists.
Consistent identifications are the goal.
![Page 7: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/7.jpg)
Example: Hydropsyche
QC taxonomist
Present Absent
Primary taxonomist
Observed 16 6Not
observed 7 43
True positive rate: 16/(16 + 7) = 0.70
False positive rate: 6/(6 + 43) = 0.12
![Page 8: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/8.jpg)
Uncertainty in error rates (Hydropsyche)
0.52 0.82 0.07 0.22
![Page 9: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/9.jpg)
What do identification error rates tell us?
• True positive rate can be restated: Given that a taxon is present, what is the
probability that the taxon is observed in the sample?
• True positive rate = P(observed | present)
• We want to know P(present | observed).
![Page 10: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/10.jpg)
NWSA data
• Estimate frequencies of occurrence of different taxa from 750 NWSA benthic samples.
• Major assumption: Assume primary taxonomist identifications
match those of the QC taxonomist.
![Page 11: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/11.jpg)
Frequencies of occurrence in WSA
Hydropsyche frequency of occurrence = 0.37
![Page 12: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/12.jpg)
Computing the posterior probability
present
absent
observed
observed
not observed
not observed
0.37
0.63
0.70
0.12
0.88
0.30
present & observed: 0.259
present & not observed: 0.111
absent & observed: 0.076
absent & not observed: 0.554
FALSE POSITIVE RATEP(observed | absent)
TRUE POSITIVE RATEP(observed | present)
![Page 13: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/13.jpg)
Computing the posterior probability (2)
observed
not observed
present
present
absent
absent
0.335
0.665
0.773
0.167
0.833
0.227
present & observed: 0.259
present & not observed: 0.111
absent & observed: 0.076
???
absent & not observed: 0.554
P(present | observed)
![Page 14: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/14.jpg)
Posterior probabilities:
Baetidae
Acentrella, Plauditus, Baetis, and Acerpenna are consistently identified.
Centroptilum, Procloeon, and Cloeon are not.
![Page 15: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/15.jpg)
Summary Statistics
• Over 600 distinct genera in the NWSA• 427 total genera analyzed from the QC• 132 genera identified consistently.• Identification of 62 genera differed significantly
between primary and QC taxonomists.
• Many of the poorly identified genera could be handled by increasing the target taxonomic level. 11 worms/leeches 7 ceratopogonids 5 mites 5 mollusks
![Page 16: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/16.jpg)
Combining genera…
![Page 17: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/17.jpg)
Summary
• QC data and analysis provide useful insights into errors originating from taxonomic identification.
• Many errors can be mitigated by specifying more coarse target taxonomic levels (“lumping”).
![Page 18: Lester L. Yuan 1 Jeroen Gerritsen 2 James B. Stribling 2](https://reader035.vdocuments.mx/reader035/viewer/2022062423/568145ab550346895db2a3a1/html5/thumbnails/18.jpg)
Summary
• Identification consistency could only be determined for relatively common genera. ~200 genera were observed in WSA data but not
in the QC data. ~230 genera were observed in the QC data so
infrequently that the uncertainty in estimated error rates was greater than 50%.
• Identification of “problem” genera can help inform future assessments and training.