What you don’t know can hurt you: uncertainties in georeferencing
John WieczorekMuseum of Vertebrate Zoology
University of California, Berkeley
Uncertainties
What comes out of a system depends on:
a)what goes into itb)what you ask of itc) what happens in between
Problem: most original data are in textual form
Problem: collection resources are scarce and can’t support large-scale digitization
What species occur where?occurrence location
What can Biodiversity Informatics do?
Taxonomic Resolution Services
What species occur where?
Georeferencing Services
ID Species Locality1 Lynx rufus Dawson Rd. N Whitehorse2 Pudu puda cerca de Valdivia3 Canis lupus 20 mi NW Duluth
9 Ursus arctos Bear Flat, Haines Junction
4 Felis concolor Pichi Trafúl5 Lama alpaca near Cuzco6 Panthera leo San Diego Zoo7 Sorex lyelli Lyell Canyon, Yosemite8 Orcinus orca 1 mi W San Juan Island
What we have:Localities we can read
“Davis, Yolo County, California”
“point method”
Coordinates: 38.5463 -121.7425Horizontal Geodetic Datum: NAD27
What is an acceptable georeference?
A numerical description of a place that can be mapped
and that describes the spatial extent of a locality
and its associated uncertainties.
1) Map inaccuracy
2) Extent of the reference
3) Coordinate imprecision
4) Undocumented datum
5) Distance imprecision
6) Direction imprecision
Scale Uncertainty (ft) Uncertainty (m)
1:1,200 3.3 ft 1.0 m
1:2,400 6.7 ft 2.0 m
1:4,800 13.3 ft 4.1 m
1:10,000 27.8 ft 8.5 m
1:12,000 33.3 ft 10.2 m
1:24,000 40.0 ft 12.2 m
1:25,000 41.8 ft 12.8 m
1:63,360 106 ft 32.2 m
1:100,000 167 ft 50.9 m
1:250,000 417 ft 127 m
Sources of uncertainty
“Davis, Yolo County, California”
“bounding-box method”
Coordinates: 38.5486 -121.754238.5450 -121.7394
Horizontal Geodetic Datum: NAD27
“Davis, Yolo County, California”
“point-radius method”
Coordinates: 38.5468 -121.7469Horizontal Geodetic Datum: NAD27Maximum Uncertainty: 8325 m
What is an ideal georeference?
A numerical description of a place that can be mapped
and that describes the spatial extent of a locality
and its associated uncertaintiesas well as possible.
point easy to produce no data quality
bounding-box simple spatial queriesdifficult quality assessment
point-radius easy quality assessmentdifficult spatial queries
shape accurate representationcomplex, uniform
Method Comparison
probability accurate representationcomplex, non-uniform
(a)
(d)(c)
(b)
Rowe, 2005. Elevational gradient analysis ofhistorical museum specimens: a cautionary tale
What species occur where?Conclusions:
1)We can help users find relevant records
2) We can help users assess data quality and fitness for use
3) In the end, users must exercise due diligence. Without 1) and 2), they can’t.