Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle

Direct measurements of methane (CH4) from individual animals are difficult and expensive. Predictions based on proxies for CH4 are a viable alternative. Most prediction models are based on multiple linear regressions (MLR) and predictor variables that are not routinely available in commercial farms, such as dry matter intake (DMI) and diet composition. The use of machine learning (ML) algorithms to predict CH4 emissions from across-country heterogeneous data sets has not been reported. The objectives were to compare performances of ML ensemble algorithm random forest (RF) and MLR models in predicting CH4 emissions from proxies in dairy cows, and assess effects of imputing missing data points on prediction accuracy. Data on CH4 emissions and proxies for CH4 from 20 herds were provided by 10 countries. The integrated data set contained 43,519 records from 3,483 cows, with 18.7% missing data points imputed using k-nearest neighbor imputation. Three data sets were created, 3k (no missing records), 21k (missing DMI imputed from milk, fat, protein, body weight), and 41k (missing DMI, milk fat, and protein records imputed). These data sets were used to test scenarios (with or without DMI, imputed vs. nonimputed DMI, milk fat, and protein), and prediction models (RF vs. MLR). Model predictive ability was evaluated within and between herds through 10-fold cross-validation. Prediction accuracy was measured as correlation between observed and predicted CH4, root mean squared error (RMSE) and mean normalized discounted cumulative gain (NDCG). Inclusion of DMI in the model improved within and between-herd prediction accuracy to 0.77 (RMSE = 23.3%) and 0.58 (RMSE = 31.9%) in RF and to 0.50 (RMSE = 0.327) and 0.13 (RMSE = 42.71) in MLR, respectively than when DMI was not included in the predictive model. When missing DMI records were imputed, within and between-herd accuracy increased to 0.84 (RMSE = 18.5%) and 0.63 (RMSE = 29.9%), respectively. In all scenarios, RF models out-performed MLR models. Results suggest routinely measured variables from dairy farms can be used in developing globally robust prediction models for CH4 if coupled with state-of-the-art techniques for imputation and advanced ML algorithms for predictive modeling.

Saved in:
Bibliographic Details
Main Authors: Negussie, Enyew, González-Recio, Oscar, Battagin, Mara, Bayat, Ali Reza, Boland, Tommy, de Haas, Yvette, Garcia-Rodriguez, Aser, Garnsworthy, Philip C., Gengler, Nicolas, Kreuzer, Michael, Kuhla, Björn, Lassen, Jan, Peiren, Nico, Pszczola, Marcin, Schwarm, Angela, Soyeurt, Hélène, Vanlierde, Amélie, Yan, Tianhai, Biscarini, Filippo
Format: Article/Letter to editor biblioteca
Language:English
Subjects:enteric methane, machine learning, prediction models, proxies for methane,
Online Access:https://research.wur.nl/en/publications/integrating-heterogeneous-across-country-data-for-proxy-based-ran
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-wur-nl-wurpubs-596907
record_format koha
spelling dig-wur-nl-wurpubs-5969072024-10-30 Negussie, Enyew González-Recio, Oscar Battagin, Mara Bayat, Ali Reza Boland, Tommy de Haas, Yvette Garcia-Rodriguez, Aser Garnsworthy, Philip C. Gengler, Nicolas Kreuzer, Michael Kuhla, Björn Lassen, Jan Peiren, Nico Pszczola, Marcin Schwarm, Angela Soyeurt, Hélène Vanlierde, Amélie Yan, Tianhai Biscarini, Filippo Article/Letter to editor Journal of Dairy Science 105 (2022) 6 ISSN: 0022-0302 Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle 2022 Direct measurements of methane (CH4) from individual animals are difficult and expensive. Predictions based on proxies for CH4 are a viable alternative. Most prediction models are based on multiple linear regressions (MLR) and predictor variables that are not routinely available in commercial farms, such as dry matter intake (DMI) and diet composition. The use of machine learning (ML) algorithms to predict CH4 emissions from across-country heterogeneous data sets has not been reported. The objectives were to compare performances of ML ensemble algorithm random forest (RF) and MLR models in predicting CH4 emissions from proxies in dairy cows, and assess effects of imputing missing data points on prediction accuracy. Data on CH4 emissions and proxies for CH4 from 20 herds were provided by 10 countries. The integrated data set contained 43,519 records from 3,483 cows, with 18.7% missing data points imputed using k-nearest neighbor imputation. Three data sets were created, 3k (no missing records), 21k (missing DMI imputed from milk, fat, protein, body weight), and 41k (missing DMI, milk fat, and protein records imputed). These data sets were used to test scenarios (with or without DMI, imputed vs. nonimputed DMI, milk fat, and protein), and prediction models (RF vs. MLR). Model predictive ability was evaluated within and between herds through 10-fold cross-validation. Prediction accuracy was measured as correlation between observed and predicted CH4, root mean squared error (RMSE) and mean normalized discounted cumulative gain (NDCG). Inclusion of DMI in the model improved within and between-herd prediction accuracy to 0.77 (RMSE = 23.3%) and 0.58 (RMSE = 31.9%) in RF and to 0.50 (RMSE = 0.327) and 0.13 (RMSE = 42.71) in MLR, respectively than when DMI was not included in the predictive model. When missing DMI records were imputed, within and between-herd accuracy increased to 0.84 (RMSE = 18.5%) and 0.63 (RMSE = 29.9%), respectively. In all scenarios, RF models out-performed MLR models. Results suggest routinely measured variables from dairy farms can be used in developing globally robust prediction models for CH4 if coupled with state-of-the-art techniques for imputation and advanced ML algorithms for predictive modeling. en application/pdf https://research.wur.nl/en/publications/integrating-heterogeneous-across-country-data-for-proxy-based-ran 10.3168/jds.2021-20158 https://edepot.wur.nl/569198 enteric methane machine learning prediction models proxies for methane https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ Wageningen University & Research
institution WUR NL
collection DSpace
country Países bajos
countrycode NL
component Bibliográfico
access En linea
databasecode dig-wur-nl
tag biblioteca
region Europa del Oeste
libraryname WUR Library Netherlands
language English
topic enteric methane
machine learning
prediction models
proxies for methane
enteric methane
machine learning
prediction models
proxies for methane
spellingShingle enteric methane
machine learning
prediction models
proxies for methane
enteric methane
machine learning
prediction models
proxies for methane
Negussie, Enyew
González-Recio, Oscar
Battagin, Mara
Bayat, Ali Reza
Boland, Tommy
de Haas, Yvette
Garcia-Rodriguez, Aser
Garnsworthy, Philip C.
Gengler, Nicolas
Kreuzer, Michael
Kuhla, Björn
Lassen, Jan
Peiren, Nico
Pszczola, Marcin
Schwarm, Angela
Soyeurt, Hélène
Vanlierde, Amélie
Yan, Tianhai
Biscarini, Filippo
Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
description Direct measurements of methane (CH4) from individual animals are difficult and expensive. Predictions based on proxies for CH4 are a viable alternative. Most prediction models are based on multiple linear regressions (MLR) and predictor variables that are not routinely available in commercial farms, such as dry matter intake (DMI) and diet composition. The use of machine learning (ML) algorithms to predict CH4 emissions from across-country heterogeneous data sets has not been reported. The objectives were to compare performances of ML ensemble algorithm random forest (RF) and MLR models in predicting CH4 emissions from proxies in dairy cows, and assess effects of imputing missing data points on prediction accuracy. Data on CH4 emissions and proxies for CH4 from 20 herds were provided by 10 countries. The integrated data set contained 43,519 records from 3,483 cows, with 18.7% missing data points imputed using k-nearest neighbor imputation. Three data sets were created, 3k (no missing records), 21k (missing DMI imputed from milk, fat, protein, body weight), and 41k (missing DMI, milk fat, and protein records imputed). These data sets were used to test scenarios (with or without DMI, imputed vs. nonimputed DMI, milk fat, and protein), and prediction models (RF vs. MLR). Model predictive ability was evaluated within and between herds through 10-fold cross-validation. Prediction accuracy was measured as correlation between observed and predicted CH4, root mean squared error (RMSE) and mean normalized discounted cumulative gain (NDCG). Inclusion of DMI in the model improved within and between-herd prediction accuracy to 0.77 (RMSE = 23.3%) and 0.58 (RMSE = 31.9%) in RF and to 0.50 (RMSE = 0.327) and 0.13 (RMSE = 42.71) in MLR, respectively than when DMI was not included in the predictive model. When missing DMI records were imputed, within and between-herd accuracy increased to 0.84 (RMSE = 18.5%) and 0.63 (RMSE = 29.9%), respectively. In all scenarios, RF models out-performed MLR models. Results suggest routinely measured variables from dairy farms can be used in developing globally robust prediction models for CH4 if coupled with state-of-the-art techniques for imputation and advanced ML algorithms for predictive modeling.
format Article/Letter to editor
topic_facet enteric methane
machine learning
prediction models
proxies for methane
author Negussie, Enyew
González-Recio, Oscar
Battagin, Mara
Bayat, Ali Reza
Boland, Tommy
de Haas, Yvette
Garcia-Rodriguez, Aser
Garnsworthy, Philip C.
Gengler, Nicolas
Kreuzer, Michael
Kuhla, Björn
Lassen, Jan
Peiren, Nico
Pszczola, Marcin
Schwarm, Angela
Soyeurt, Hélène
Vanlierde, Amélie
Yan, Tianhai
Biscarini, Filippo
author_facet Negussie, Enyew
González-Recio, Oscar
Battagin, Mara
Bayat, Ali Reza
Boland, Tommy
de Haas, Yvette
Garcia-Rodriguez, Aser
Garnsworthy, Philip C.
Gengler, Nicolas
Kreuzer, Michael
Kuhla, Björn
Lassen, Jan
Peiren, Nico
Pszczola, Marcin
Schwarm, Angela
Soyeurt, Hélène
Vanlierde, Amélie
Yan, Tianhai
Biscarini, Filippo
author_sort Negussie, Enyew
title Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
title_short Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
title_full Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
title_fullStr Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
title_full_unstemmed Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
title_sort integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle
url https://research.wur.nl/en/publications/integrating-heterogeneous-across-country-data-for-proxy-based-ran
work_keys_str_mv AT negussieenyew integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT gonzalezreciooscar integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT battaginmara integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT bayatalireza integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT bolandtommy integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT dehaasyvette integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT garciarodriguezaser integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT garnsworthyphilipc integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT genglernicolas integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT kreuzermichael integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT kuhlabjorn integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT lassenjan integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT peirennico integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT pszczolamarcin integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT schwarmangela integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT soyeurthelene integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT vanlierdeamelie integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT yantianhai integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
AT biscarinifilippo integratingheterogeneousacrosscountrydataforproxybasedrandomforestpredictionofentericmethaneindairycattle
_version_ 1816153121199489024