Beyond assimilation of leaf area index: Leveraging additional spectral information using machine learning for site-specific soybean yield prediction

Assimilating external observations of crop state in cropping system models is essential for making spatially explicit predictions of crop variables relevant in precision agriculture. Satellite-based leaf area index (LAI) estimates have been the most frequent variable used as a proxy of actual crop growth. However, additional information beyond LAI, like canopy N content, water content, and structure, can be retrieved from satellite observations. Including such variables by data assimilation directly is difficult because many crop models do not have corresponding state variables or the relationship between the observations and the process that regulates crop growth is unclear. Therefore, other approaches are required to include such information. In this study, we investigate the improvement in the predicted yield and feature impact on model outputs by using a hybrid approach that combines observations from Sentinel-1 and 2 time-series with the outputs from a process-based model embedded in a data assimilation framework and uses the Gradient-boosted trees regressor (GBTR) as predictive model. We used two regions with soybean fields: the US (13 K points) and Uruguay (400 K points). We found an advantage when using the GBTR as the predictive model (reduced RRMSE by ∼16%) compared to data assimilation. Adding the vegetation indices had a marginal improvement (reduced RRMSE by ∼1%), while the impact of adding reflectance and backscatter values was negative. The satellite-based features had a very small importance score, while features' impact on prediction was predominantly unclear, explaining the marginal predictive power added by satellite-based features. We found that features from the reproductive stages had the highest importance, while the importance of an index related to drought stress (NMDI) across the growing season provided insights for further improvement of data assimilation methods. However, more studies are required to better disentangle pathways towards further improvement in constraining crop models by ingesting satellite observations.

Saved in:
Bibliographic Details
Main Authors: Gaso, Deborah, Paudel, Dilli, de Wit, Allard, Puntel, Laila, Mullissa, Adugna, Kooistra, Lammert
Format: Article/Letter to editor biblioteca
Language:English
Subjects:Life Science,
Online Access:https://research.wur.nl/en/publications/beyond-assimilation-of-leaf-area-index-leveraging-additional-spec
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Assimilating external observations of crop state in cropping system models is essential for making spatially explicit predictions of crop variables relevant in precision agriculture. Satellite-based leaf area index (LAI) estimates have been the most frequent variable used as a proxy of actual crop growth. However, additional information beyond LAI, like canopy N content, water content, and structure, can be retrieved from satellite observations. Including such variables by data assimilation directly is difficult because many crop models do not have corresponding state variables or the relationship between the observations and the process that regulates crop growth is unclear. Therefore, other approaches are required to include such information. In this study, we investigate the improvement in the predicted yield and feature impact on model outputs by using a hybrid approach that combines observations from Sentinel-1 and 2 time-series with the outputs from a process-based model embedded in a data assimilation framework and uses the Gradient-boosted trees regressor (GBTR) as predictive model. We used two regions with soybean fields: the US (13 K points) and Uruguay (400 K points). We found an advantage when using the GBTR as the predictive model (reduced RRMSE by ∼16%) compared to data assimilation. Adding the vegetation indices had a marginal improvement (reduced RRMSE by ∼1%), while the impact of adding reflectance and backscatter values was negative. The satellite-based features had a very small importance score, while features' impact on prediction was predominantly unclear, explaining the marginal predictive power added by satellite-based features. We found that features from the reproductive stages had the highest importance, while the importance of an index related to drought stress (NMDI) across the growing season provided insights for further improvement of data assimilation methods. However, more studies are required to better disentangle pathways towards further improvement in constraining crop models by ingesting satellite observations.