Attribute selection impact on linear and nonlinear regression models for crop yield prediction

Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5' regression trees, and artificial neural networks (ANN) were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE), relative mean absolute error (RMAE), and correlation factor (R). The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%), lowest average RMAE (8.75%), and the highest average correlation factor (0.63).

Saved in:
Bibliographic Details
Main Authors: JUAN FRAUSTO SOLIS, WALDO OJEDA BUSTAMANTE
Format: info:eu-repo/semantics/article biblioteca
Language:eng
Published: Hindawi Publishing Corporation
Subjects:info:eu-repo/classification/Autor/Cultivos alimenticios, info:eu-repo/classification/Autor/Predicción, info:eu-repo/classification/Autor/Modelos matemáticos, info:eu-repo/classification/cti/7,
Online Access:http://hdl.handle.net/20.500.12013/1969
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-imta-mx-20.500.12013-1969
record_format koha
spelling dig-imta-mx-20.500.12013-19692024-12-20T01:49:21Z Attribute selection impact on linear and nonlinear regression models for crop yield prediction JUAN FRAUSTO SOLIS WALDO OJEDA BUSTAMANTE info:eu-repo/classification/Autor/Cultivos alimenticios info:eu-repo/classification/Autor/Predicción info:eu-repo/classification/Autor/Modelos matemáticos info:eu-repo/classification/cti/7 Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5' regression trees, and artificial neural networks (ANN) were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE), relative mean absolute error (RMAE), and correlation factor (R). The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%), lowest average RMAE (8.75%), and the highest average correlation factor (0.63). 2014 info:eu-repo/semantics/article http://hdl.handle.net/20.500.12013/1969 eng info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-nd/4.0 application/pdf Hindawi Publishing Corporation The Scientific World Journal (1537-744X), ID 509429
institution IMTA MX
collection DSpace
country México
countrycode MX
component Bibliográfico
access En linea
databasecode dig-imta-mx
tag biblioteca
region America del Norte
libraryname Biblioteca del IMTA de México
language eng
topic info:eu-repo/classification/Autor/Cultivos alimenticios
info:eu-repo/classification/Autor/Predicción
info:eu-repo/classification/Autor/Modelos matemáticos
info:eu-repo/classification/cti/7
info:eu-repo/classification/Autor/Cultivos alimenticios
info:eu-repo/classification/Autor/Predicción
info:eu-repo/classification/Autor/Modelos matemáticos
info:eu-repo/classification/cti/7
spellingShingle info:eu-repo/classification/Autor/Cultivos alimenticios
info:eu-repo/classification/Autor/Predicción
info:eu-repo/classification/Autor/Modelos matemáticos
info:eu-repo/classification/cti/7
info:eu-repo/classification/Autor/Cultivos alimenticios
info:eu-repo/classification/Autor/Predicción
info:eu-repo/classification/Autor/Modelos matemáticos
info:eu-repo/classification/cti/7
JUAN FRAUSTO SOLIS
WALDO OJEDA BUSTAMANTE
Attribute selection impact on linear and nonlinear regression models for crop yield prediction
description Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5' regression trees, and artificial neural networks (ANN) were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE), relative mean absolute error (RMAE), and correlation factor (R). The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%), lowest average RMAE (8.75%), and the highest average correlation factor (0.63).
format info:eu-repo/semantics/article
topic_facet info:eu-repo/classification/Autor/Cultivos alimenticios
info:eu-repo/classification/Autor/Predicción
info:eu-repo/classification/Autor/Modelos matemáticos
info:eu-repo/classification/cti/7
author JUAN FRAUSTO SOLIS
WALDO OJEDA BUSTAMANTE
author_facet JUAN FRAUSTO SOLIS
WALDO OJEDA BUSTAMANTE
author_sort JUAN FRAUSTO SOLIS
title Attribute selection impact on linear and nonlinear regression models for crop yield prediction
title_short Attribute selection impact on linear and nonlinear regression models for crop yield prediction
title_full Attribute selection impact on linear and nonlinear regression models for crop yield prediction
title_fullStr Attribute selection impact on linear and nonlinear regression models for crop yield prediction
title_full_unstemmed Attribute selection impact on linear and nonlinear regression models for crop yield prediction
title_sort attribute selection impact on linear and nonlinear regression models for crop yield prediction
publisher Hindawi Publishing Corporation
url http://hdl.handle.net/20.500.12013/1969
work_keys_str_mv AT juanfraustosolis attributeselectionimpactonlinearandnonlinearregressionmodelsforcropyieldprediction
AT waldoojedabustamante attributeselectionimpactonlinearandnonlinearregressionmodelsforcropyieldprediction
_version_ 1819139822507261952