Validation of uncertainty predictions in digital soil mapping

It is quite common in digital soil mapping (DSM) to quantify the uncertainty of issued predictions, that is to make probabilistic predictions. Yet, little attention has been paid to its validation. Probabilistic predictions are only of value for end users if they are reliable and ideally also sharp. Reliability refers to the consistency between predicted conditional probabilities and observed frequencies of independent test data. Sharpness refers to the concentration of a conditional probability distribution function, i.e. its narrowness. The prediction interval coverage probability (PICP) is currently used in DSM to validate the reliability of prediction intervals but it is ignorant of a potential one-sided bias of its boundaries. Therefore, we propose to extend the current validation procedure with metrics used in the broader probabilistic literature. These metrics not only evaluate probabilistic predictions in prediction interval format but also quantiles or full conditional probability distributions. We suggest the quantile coverage probability (QCP) and probability integral transform (PIT) histogram as alternatives to PICP and proper scoring rules for relative comparisons of competing probabilistic models. As scoring rules, we present the interval score (IS) and the continuous ranked probability score (CRPS), which can be decomposed into a reliability part (RELI). We illustrated the use of these metrics in a case study using soil pH and soil organic carbon from the LUCAS-soil database. Thereby, probabilistic predictions of five different models were compared: a reference null model (NM), quantile regression forest (QRF), quantile regression post-processing of a random forest (QRPP RF), kriging with external drift (KED) and quantile regression neural network (QRNN). For KED and QRNN, one-sided bias was found. This was not apparent from PICP but was shown by use of the PIT histogram and QCP. RELI summarized the trends found in QCP, PICP and PIT histograms to one numerical value. CRPS and IS were especially harsh to outliers and low sharpness. According to CRPS and IS, the best probabilistic predictions were obtained by QRF and QRPP RF and the worst by NM.

Saved in:
Bibliographic Details
Main Authors: Schmidinger, Jonas, Heuvelink, Gerard B.M.
Format: Article/Letter to editor biblioteca
Language:English
Subjects:Digital soil mapping, Machine learning, Proper scoring rules, Quantile regression, Uncertainty, Validation,
Online Access:https://research.wur.nl/en/publications/validation-of-uncertainty-predictions-in-digital-soil-mapping
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:It is quite common in digital soil mapping (DSM) to quantify the uncertainty of issued predictions, that is to make probabilistic predictions. Yet, little attention has been paid to its validation. Probabilistic predictions are only of value for end users if they are reliable and ideally also sharp. Reliability refers to the consistency between predicted conditional probabilities and observed frequencies of independent test data. Sharpness refers to the concentration of a conditional probability distribution function, i.e. its narrowness. The prediction interval coverage probability (PICP) is currently used in DSM to validate the reliability of prediction intervals but it is ignorant of a potential one-sided bias of its boundaries. Therefore, we propose to extend the current validation procedure with metrics used in the broader probabilistic literature. These metrics not only evaluate probabilistic predictions in prediction interval format but also quantiles or full conditional probability distributions. We suggest the quantile coverage probability (QCP) and probability integral transform (PIT) histogram as alternatives to PICP and proper scoring rules for relative comparisons of competing probabilistic models. As scoring rules, we present the interval score (IS) and the continuous ranked probability score (CRPS), which can be decomposed into a reliability part (RELI). We illustrated the use of these metrics in a case study using soil pH and soil organic carbon from the LUCAS-soil database. Thereby, probabilistic predictions of five different models were compared: a reference null model (NM), quantile regression forest (QRF), quantile regression post-processing of a random forest (QRPP RF), kriging with external drift (KED) and quantile regression neural network (QRNN). For KED and QRNN, one-sided bias was found. This was not apparent from PICP but was shown by use of the PIT histogram and QCP. RELI summarized the trends found in QCP, PICP and PIT histograms to one numerical value. CRPS and IS were especially harsh to outliers and low sharpness. According to CRPS and IS, the best probabilistic predictions were obtained by QRF and QRPP RF and the worst by NM.