Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

Silva, J.V.; Heerwaarden, J.; Reidsma, P.; Laborte, A.G.; Fantaye, K.T.; van Ittersum, M.K.

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

Context: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective: The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. Methods: A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. Results: Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R2 considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. Conclusion: Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. Significance: The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used.

Saved in:

Bibliographic Details
Main Authors:	Silva, J.V., Heerwaarden, J., Reidsma, P., Laborte, A.G., Fantaye, K.T., van Ittersum, M.K.
Format:	Article biblioteca
Language:	English
Published:	Elsevier B.V. 2023
Subjects:	AGRICULTURAL SCIENCES AND BIOTECHNOLOGY, Model Accuracy, Model Precision, Linear Mixed Models, MACHINE LEARNING, SUSTAINABLE INTENSIFICATION, BIG DATA, YIELDS, MODELS, AGRONOMY, Sustainable Agrifood Systems,
Online Access:	https://hdl.handle.net/10883/22678
Tags:	Add Tag No Tags, Be the first to tag this record!

id	dig-cimmyt-10883-22678
record_format	koha
spelling	dig-cimmyt-10883-226782024-01-22T16:37:13Z Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy Silva, J.V. Heerwaarden, J. Reidsma, P. Laborte, A.G. Fantaye, K.T. van Ittersum, M.K. AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Model Accuracy Model Precision Linear Mixed Models MACHINE LEARNING SUSTAINABLE INTENSIFICATION BIG DATA YIELDS MODELS AGRONOMY Sustainable Agrifood Systems Context: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective: The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. Methods: A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. Results: Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R2 considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. Conclusion: Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. Significance: The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used. 2023-08-15T00:30:16Z 2023-08-15T00:30:16Z 2023 Article Published Version https://hdl.handle.net/10883/22678 10.1016/j.fcr.2023.109063 English Nutrition, health & food security Poverty reduction, livelihoods & jobs Excellence in Agronomy Resilient Agrifood Systems Netherlands Science Foundation Bill & Melinda Gates Foundation (BMGF) https://hdl.handle.net/10568/131409 CIMMYT manages Intellectual Assets as International Public Goods. The user is free to download, print, store and share this work. In case you want to translate or create any other derivative work and share or distribute such translation/derivative work, please contact CIMMYT-Knowledge-Center@cgiar.org indicating the work you want to use and the kind of use you intend; CIMMYT will contact you with the suitable license for that purpose Open Access Amsterdam (Netherlands) Elsevier B.V. 302 0378-4290 Field Crops Research 109063
institution	CIMMYT
collection	DSpace
country	México
countrycode	MX
component	Bibliográfico
access	En linea
databasecode	dig-cimmyt
tag	biblioteca
region	America del Norte
libraryname	CIMMYT Library
language	English
topic	AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Model Accuracy Model Precision Linear Mixed Models MACHINE LEARNING SUSTAINABLE INTENSIFICATION BIG DATA YIELDS MODELS AGRONOMY Sustainable Agrifood Systems AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Model Accuracy Model Precision Linear Mixed Models MACHINE LEARNING SUSTAINABLE INTENSIFICATION BIG DATA YIELDS MODELS AGRONOMY Sustainable Agrifood Systems
spellingShingle	AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Model Accuracy Model Precision Linear Mixed Models MACHINE LEARNING SUSTAINABLE INTENSIFICATION BIG DATA YIELDS MODELS AGRONOMY Sustainable Agrifood Systems AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Model Accuracy Model Precision Linear Mixed Models MACHINE LEARNING SUSTAINABLE INTENSIFICATION BIG DATA YIELDS MODELS AGRONOMY Sustainable Agrifood Systems Silva, J.V. Heerwaarden, J. Reidsma, P. Laborte, A.G. Fantaye, K.T. van Ittersum, M.K. Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
description	Context: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. Objective: The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. Methods: A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. Results: Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R2 considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. Conclusion: Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. Significance: The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used.
format	Article
topic_facet	AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Model Accuracy Model Precision Linear Mixed Models MACHINE LEARNING SUSTAINABLE INTENSIFICATION BIG DATA YIELDS MODELS AGRONOMY Sustainable Agrifood Systems
author	Silva, J.V. Heerwaarden, J. Reidsma, P. Laborte, A.G. Fantaye, K.T. van Ittersum, M.K.
author_facet	Silva, J.V. Heerwaarden, J. Reidsma, P. Laborte, A.G. Fantaye, K.T. van Ittersum, M.K.
author_sort	Silva, J.V.
title	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_short	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_full	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_fullStr	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_full_unstemmed	Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_sort	big data, small explanatory and predictive power: lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
publisher	Elsevier B.V.
publishDate	2023
url	https://hdl.handle.net/10883/22678
work_keys_str_mv	AT silvajv bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT heerwaardenj bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT reidsmap bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT laborteag bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT fantayekt bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy AT vanittersummk bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy
_version_	1792501500716515328

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

Similar Items

Resource Map