spatial data uncertainty in geostatistics and …...spatial data uncertainty in geostatistics and...
TRANSCRIPT
Spatial Data Uncertainty in Geostatistics and Machine Learning: A Case Study
MINERAL RESOURCES
Franky Fouedjio and Jens Klump | Science Leader Earth Science Informatics30 May 2018
Introduction• Machine learning can be a great tool, but how does it compare
with established, well-understood tools?
• Specific questions:• Reporting and interpretation of uncertainty
• Spatially correlated variables
• Methods:• Real-world case of geochemical and auxiliary data
• Simulation of geochemical and auxiliary data
Spatial Data Uncertainty | Fouedjio & Klump2 |
Case Study: Soil Geochemistry• Data: soil geochemistry of southwest England (source: C. Kirkwood, BGS
G-BASE)
• Elements used in this study: Al, Ba, Br, Ca, Ce, Co, Cr, Cs, Fe, Ga, Ge, Hf, K, La, Mg, Mn, Mo, Na, Nb, Nd, Ni, P, Rb, Sc, Se, Si, Sm, Sr, Ta, Th, Ti, U, V, Y, Zr
• Other elements were excluded due to their hydrothermal mobility or concentrations below detection limits.
• Auxiliary data: Gravity, geomorphology, radiometrics, IR (LANDSAT)
• Geographically sparse data
• Aim: geochemical exploration (outliers)
Spatial Data Uncertainty | Fouedjio & Klump3 |
Study Area
Spatial Data Uncertainty | Fouedjio & Klump4 |
Auxiliary Variables
Spatial Data Uncertainty | Fouedjio & Klump5 |
Prediction – Kriging vs. Random Forest
Spatial Data Uncertainty | Fouedjio & Klump6 |
Pre
dic
tio
nU
nce
rtai
nty
Kriging Random Forest
KED vs. QRF in real-world data• Prediction uncertainty maps provided by KED vary less across the
study area than QRF.
• The largest prediction uncertainties given by KED are concentrated in those areas not surveyed or where the sampling was too sparse.
• Prediction uncertainty provided by KED depends mainly on the data configurations.
• Prediction uncertainty maps provided by QRF show spatial patterns not related to the density of sampling locations but to the distribution of particular auxiliary variables.
Spatial Data Uncertainty | Fouedjio & Klump7 |
Real-world vs. simulated data• Geological data are
notoriously sparse and incomplete because data are not easily obtained.
• To overcome the shortcomings of real-word geological data we decided to test KED and QRF on synthetic data sets.
Spatial Data Uncertainty | Fouedjio & Klump8 |
Simulated Data
Spatial Data Uncertainty | Fouedjio & Klump9 |
Simulated Data• Case 1: linear relationship between Y and X, where Y has a weak
spatial correlation.
• Case 2: linear relationship between the Y and X, where Y has a strong spatial correlation.
• Case 3: complex nonlinear relationship between Y and X, where Y displays a weak spatial correlation.
• Case 4: complex nonlinear relationship between Y and X, where the target variable displays a strong spatial correlation.
Spatial Data Uncertainty | Fouedjio & Klump10 |
Spatial Data Uncertainty | Fouedjio & Klump
Simulations of the auxiliary variables
11 |
An example of simulated auxiliary variables using Gaussian Random Fields.
Simulations of the target variables
Spatial Data Uncertainty | Fouedjio & Klump12 |
An example of the simulated target variables under different cases.
Case 2: Linear response, spatial correlation• As expected, Kriging with
External Drift (KED) outperforms Quantile Regression Random Forest (QRF).
Spatial Data Uncertainty | Fouedjio & Klump13 |
Case 3: non-linear, no spatial correlation• As expected, the Root Mean
Square Error (RMSE) shows that QRF performs better than KDE.
• The goodness of fit, accuracy and probability interval width show that KED has a better prediction uncertainty that QRF.
Spatial Data Uncertainty | Fouedjio & Klump14 |
Conclusions• We compared the predictive capacity and uncertainty of KED and
QRF using both real-world and synthetic data.• In a direct comparison of KED and QRF, both methods produced
similar predictive maps.• As expected, KED is able to exploit linear spatial dependencies
while QRF is better at handling non-linear dependencies.• Surprisingly, KED always outperformed QRF with respect to
measures of uncertainty.• Applications must weigh the benefits of better uncertainty in KED
against better handling of non-linearity in QRF.
Spatial Data Uncertainty | Fouedjio & Klump15 |
Mineral ResourcesJens KlumpScience Leader
t +61 8 6436 8828e [email protected] people.csiro.au/Jens-Klump
Stanford UniversityFrancky Fouedjio
t +61 2 9123 4567e [email protected] www.csiro.au/lorem
Thank you
MINERAL RESOURCES