Spatial validation reveals poor predictive performance of large-scale ecological mapping models

Mapping aboveground forest biomass is central for assessing the global carbon balance. However, current large-scale maps show strong disparities, despite good validation statistics of their underlying models. Here, we attribute this contradiction to a flaw in the validation methods, which ignore spatial autocorrelation (SAC) in data, leading to overoptimistic assessment of model predictive power. To illustrate this issue, we reproduce the approach of large-scale mapping studies using a massive forest inventory dataset of 11.8 million trees in central Africa to train and validate a random forest model based on multispectral and environmental variables. A standard nonspatial validation method suggests that the model predicts more than half of the forest biomass variation, while spatial validation methods accounting for SAC reveal quasi-null predictive power. This study underscores how a common practice in big data mapping studies shows an apparent high predictive power, even when predictors have poor relationships with the ecological variable of interest, thus possibly leading to erroneous maps and interpretations.

Saved in:
Bibliographic Details
Main Authors: Ploton, Pierre, Mortier, Frédéric, Rejou-Mechain, Maxime, Barbier, Nicolas, Picard, Nicolas, Rossi, Vivien, Dormann, Carsten F., Cornu, Guillaume, Viennois, Gaëlle, Bayol, Nicolas, Lyapustin, Alexei I., Gourlet-Fleury, Sylvie, Pélissier, Raphaël
Format: article biblioteca
Language:eng
Subjects:P01 - Conservation de la nature et ressources foncières, K01 - Foresterie - Considérations générales, U30 - Méthodes de recherche, forêt tropicale, écologie, technique de prévision, cartographie, modèle mathématique, modèle de simulation, qualité, défaut, http://aims.fao.org/aos/agrovoc/c_24904, http://aims.fao.org/aos/agrovoc/c_2467, http://aims.fao.org/aos/agrovoc/c_3041, http://aims.fao.org/aos/agrovoc/c_1344, http://aims.fao.org/aos/agrovoc/c_24199, http://aims.fao.org/aos/agrovoc/c_24242, http://aims.fao.org/aos/agrovoc/c_6400, http://aims.fao.org/aos/agrovoc/c_24158, http://aims.fao.org/aos/agrovoc/c_1432,
Online Access:http://agritrop.cirad.fr/596514/
http://agritrop.cirad.fr/596514/1/ploton_NC20.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Mapping aboveground forest biomass is central for assessing the global carbon balance. However, current large-scale maps show strong disparities, despite good validation statistics of their underlying models. Here, we attribute this contradiction to a flaw in the validation methods, which ignore spatial autocorrelation (SAC) in data, leading to overoptimistic assessment of model predictive power. To illustrate this issue, we reproduce the approach of large-scale mapping studies using a massive forest inventory dataset of 11.8 million trees in central Africa to train and validate a random forest model based on multispectral and environmental variables. A standard nonspatial validation method suggests that the model predicts more than half of the forest biomass variation, while spatial validation methods accounting for SAC reveal quasi-null predictive power. This study underscores how a common practice in big data mapping studies shows an apparent high predictive power, even when predictors have poor relationships with the ecological variable of interest, thus possibly leading to erroneous maps and interpretations.