Local interpretation of machine learning models in remote sensing with SHAP: the case of global climate constraints on photosynthesis phenology

Data-driven models using machine learning have been widely used in remote-sensing applications such as the retrieval of biophysical variables and land cover classification. However, these models behave as a ‘black box’, meaning that the relationships between the input and predicted variables are hard to interpret. Recent regression models that downscale sun-induced fluorescence (SIF) with MODIS and weather variables are an example. The impact of weather variables on the predicted SIF in these models is unknown. The explanation of such weather–SIF relationships would aid in the understanding of climate-related constraints on photosynthesis phenology since SIF is a proxy of gross primary productivity. Here, we used SHapley Additive exPlanations (SHAP) – a novel technique based on game theory – for explaining the contribution of input variables to the individual predictions in a machine learning model. We explored the capabilities of this technique with a weather–SIF model. The regression model predicted ESA-TROPOSIF measurements from ERA5-Land air temperature, shortwave radiation, and vapour-pressure-deficit (VPD) data. The SHAP values of the model were estimated at the start and end of the growing season for the entire globe. These values depicted the global constraints of the three climate variables on the photosynthetically active season and confirmed existing knowledge on the limiting factors of terrestrial photosynthesis with unprecedented spatial detail. Radiation was the limiting factor in tropical rainforest and VPD constrained the start and end of the growing season in tropical dryland ecosystems. In extra-tropical regions, temperature was the main limiting factor during the start of the growing season, but both temperature and radiation constrained photosynthesis at the end of the growing season. This technique may help future remote sensing studies that require the use of non-interpretable machine-learning regression models and explain how input variables contribute to the model prediction in a spatiotemporally explicit manner.

Saved in:
Bibliographic Details
Main Authors: Descals, Adrià, Verger, Aleixandre, Yin, Gaofei, Filella, Iolanda, Peñuelas, Josep
Other Authors: Ministerio de Ciencia e Innovación (España)
Format: artículo biblioteca
Published: Taylor & Francis 2023
Subjects:SHapley Additive exPlanations, Explainable machine learning, Local interpretation, Sun-induced fluorescence, Vegetation phenology, Climate constraints, Photosynthesis dynamics,
Online Access:http://hdl.handle.net/10261/339744
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data-driven models using machine learning have been widely used in remote-sensing applications such as the retrieval of biophysical variables and land cover classification. However, these models behave as a ‘black box’, meaning that the relationships between the input and predicted variables are hard to interpret. Recent regression models that downscale sun-induced fluorescence (SIF) with MODIS and weather variables are an example. The impact of weather variables on the predicted SIF in these models is unknown. The explanation of such weather–SIF relationships would aid in the understanding of climate-related constraints on photosynthesis phenology since SIF is a proxy of gross primary productivity. Here, we used SHapley Additive exPlanations (SHAP) – a novel technique based on game theory – for explaining the contribution of input variables to the individual predictions in a machine learning model. We explored the capabilities of this technique with a weather–SIF model. The regression model predicted ESA-TROPOSIF measurements from ERA5-Land air temperature, shortwave radiation, and vapour-pressure-deficit (VPD) data. The SHAP values of the model were estimated at the start and end of the growing season for the entire globe. These values depicted the global constraints of the three climate variables on the photosynthetically active season and confirmed existing knowledge on the limiting factors of terrestrial photosynthesis with unprecedented spatial detail. Radiation was the limiting factor in tropical rainforest and VPD constrained the start and end of the growing season in tropical dryland ecosystems. In extra-tropical regions, temperature was the main limiting factor during the start of the growing season, but both temperature and radiation constrained photosynthesis at the end of the growing season. This technique may help future remote sensing studies that require the use of non-interpretable machine-learning regression models and explain how input variables contribute to the model prediction in a spatiotemporally explicit manner.