Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data

Ferreira,Roberta de Amorim; Teixeira,Gabriely; Peternelli,Luiz Alexandre

Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data

ABSTRACT: Splitting the whole dataset into training and testing subsets is a crucial part of optimizing models. This study evaluated the influence of the choice of the training subset in the construction of predictive models, as well as on their validation. For this purpose we assessed the Kennard-Stone (KS) and the Random Sampling (RS) methods in near-infrared spectroscopy data (NIR) and marker data SNPs (Single Nucleotide Polymorphisms). It is worth noting that in SNPs data, there is no knowledge of reports in the literature regarding the use of the KS method. For the construction and validation of the models, the partial least squares (PLS) estimation method and the Bayesian Lasso (BLASSO) proved to be more efficient for NIR data and for marker data SNPs, respectively. The evaluation of the predictive capacity of the models obtained after the data partition occurred through the correlation between the predicted and the observed values, and the corresponding square root of the mean squared error of prediction. For both datasets, results indicated that the results from KS and RS methods differ statistically from each other by the F test (P-value < 0.01). The KS method showed to be more efficient than RS in practically all repetitions. Also, KS method has the advantage of being easy and fast to be applied and also to select the same samples, which provides excellent benefits in the following analyses.

Saved in:

Bibliographic Details
Main Authors:	Ferreira,Roberta de Amorim, Teixeira,Gabriely, Peternelli,Luiz Alexandre
Format:	Digital revista
Language:	English
Published:	Universidade Federal de Santa Maria 2022
Online Access:	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-84782022000500202
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:scielo:S0103-84782022000500202
record_format	ojs
spelling	oai:scielo:S0103-847820220005002022021-10-27Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR dataFerreira,Roberta de AmorimTeixeira,GabrielyPeternelli,Luiz Alexandre data division PLS regression BLASSO chemometrics prediction power ABSTRACT: Splitting the whole dataset into training and testing subsets is a crucial part of optimizing models. This study evaluated the influence of the choice of the training subset in the construction of predictive models, as well as on their validation. For this purpose we assessed the Kennard-Stone (KS) and the Random Sampling (RS) methods in near-infrared spectroscopy data (NIR) and marker data SNPs (Single Nucleotide Polymorphisms). It is worth noting that in SNPs data, there is no knowledge of reports in the literature regarding the use of the KS method. For the construction and validation of the models, the partial least squares (PLS) estimation method and the Bayesian Lasso (BLASSO) proved to be more efficient for NIR data and for marker data SNPs, respectively. The evaluation of the predictive capacity of the models obtained after the data partition occurred through the correlation between the predicted and the observed values, and the corresponding square root of the mean squared error of prediction. For both datasets, results indicated that the results from KS and RS methods differ statistically from each other by the F test (P-value < 0.01). The KS method showed to be more efficient than RS in practically all repetitions. Also, KS method has the advantage of being easy and fast to be applied and also to select the same samples, which provides excellent benefits in the following analyses.info:eu-repo/semantics/openAccessUniversidade Federal de Santa MariaCiência Rural v.52 n.5 20222022-01-01info:eu-repo/semantics/articletext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-84782022000500202en10.1590/0103-8478cr20201072
institution	SCIELO
collection	OJS
country	Brasil
countrycode	BR
component	Revista
access	En linea
databasecode	rev-scielo-br
tag	revista
region	America del Sur
libraryname	SciELO
language	English
format	Digital
author	Ferreira,Roberta de Amorim Teixeira,Gabriely Peternelli,Luiz Alexandre
spellingShingle	Ferreira,Roberta de Amorim Teixeira,Gabriely Peternelli,Luiz Alexandre Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data
author_facet	Ferreira,Roberta de Amorim Teixeira,Gabriely Peternelli,Luiz Alexandre
author_sort	Ferreira,Roberta de Amorim
title	Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data
title_short	Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data
title_full	Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data
title_fullStr	Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data
title_full_unstemmed	Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data
title_sort	kennard-stone method outperforms the random sampling in the selection of calibration samples in snps and nir data
description	ABSTRACT: Splitting the whole dataset into training and testing subsets is a crucial part of optimizing models. This study evaluated the influence of the choice of the training subset in the construction of predictive models, as well as on their validation. For this purpose we assessed the Kennard-Stone (KS) and the Random Sampling (RS) methods in near-infrared spectroscopy data (NIR) and marker data SNPs (Single Nucleotide Polymorphisms). It is worth noting that in SNPs data, there is no knowledge of reports in the literature regarding the use of the KS method. For the construction and validation of the models, the partial least squares (PLS) estimation method and the Bayesian Lasso (BLASSO) proved to be more efficient for NIR data and for marker data SNPs, respectively. The evaluation of the predictive capacity of the models obtained after the data partition occurred through the correlation between the predicted and the observed values, and the corresponding square root of the mean squared error of prediction. For both datasets, results indicated that the results from KS and RS methods differ statistically from each other by the F test (P-value < 0.01). The KS method showed to be more efficient than RS in practically all repetitions. Also, KS method has the advantage of being easy and fast to be applied and also to select the same samples, which provides excellent benefits in the following analyses.
publisher	Universidade Federal de Santa Maria
publishDate	2022
url	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-84782022000500202
work_keys_str_mv	AT ferreirarobertadeamorim kennardstonemethodoutperformstherandomsamplingintheselectionofcalibrationsamplesinsnpsandnirdata AT teixeiragabriely kennardstonemethodoutperformstherandomsamplingintheselectionofcalibrationsamplesinsnpsandnirdata AT peternelliluizalexandre kennardstonemethodoutperformstherandomsamplingintheselectionofcalibrationsamplesinsnpsandnirdata
_version_	1756406619365703680

Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data

Similar Items

Resource Map