Near infrared spectroscopy predictions on heterogeneous databases

Faced to heterogeneous database questions, a NIR user is often answered: "you should work on more homogeneous data-sets". Nevertheless, as heterogeneity and variability is widespread among lots of agriculture areas, it is not always possible to have subsets which are at the same time homogeneous and large enough for calibration. It is therefore interesting to try calibration on heterogeneous databases before saying it is impossible... The major objective was to compare different strategies for NIR predictions. On one hand, build models from a data-set comprising different data-subsets, and on another hand, compare them to models based on the 'pure' data-subsets. The raw materials studied there originated from industrially pre-processed plant residues and other tropical plant residues, potentially utilisable in composting. Pure data-sets were (i) wet grape skins, (ii) dry grape skins, (iii) de-oiled grape pips, (iv) coffee cake, (v) cocoa cake, (vi) olive pulp, (vii) tropical plant residues samples. The parameter measured were Organic Matter OM (n= 30 to 56) and Total Nitrogen Kjeldahl TN (n=32 to 55) for the pure data-sets. The compiled data-set comprised 327 OM and 283 TN analyses. All samples were dried (40°C) ground (<1 mm sieve) and scanned on a NIRS 6500 (Foss NIRSystems) in ring cups. Spectra were corrected with SNVD 2,5,5 (WIN-ISI) mathematical pre-treatment and calibrations were performed using a modified partial least square regression (mPLS, WIN-ISI). The equations for OM had Standard Errors of Calibration (SEC) varying from 0.28 to 0.75 g 100 g-1 d.m., for the pure data-sets, and 0.94 for the compiled data-set. The equations for TN had SEC varying from 0.10 to 0.15 g 100 d.m., and 0.16 g 100 g-1 d.m., respectively. Standard Errors of Cross Validation (SECV) for OM varied from 0.44 to 1.27 g 100 g-1 d.m., and 1.07 g 100 g-1 d.m., respectively, whereas those of TN varied from 0.12 to 0.49 g 100 g-1 d.m., and 0.17 g 100 g-1 d.m., respectively. The corresponding SD/SECV ratios for OM varied from 1.3 to 3.9 for the pure data-sets, and equalled 2.8 for the compiled data-set. Those of TN varied from 1.4 to 3.7, and 3.1, respectively. Calibrations on pure data-sets seem to perform slightly better than that of the compilation. Nevertheless, models developed on the global data-set (made by compilation of the subsets, thus heterogeneous) had an acceptable predictive capacity and this strategy is therefore very useful.

Saved in:
Bibliographic Details
Main Authors: Thuriès, Laurent, Bastianelli, Denis, Bonnal, Laurent, Davrieux, Fabrice
Format: conference_item biblioteca
Language:eng
Published: IM Publications
Subjects:U10 - Informatique, mathématiques et statistiques, Q70 - Traitement des déchets agricoles, compost, matière organique, fertilité du sol, http://aims.fao.org/aos/agrovoc/c_1795, http://aims.fao.org/aos/agrovoc/c_5387, http://aims.fao.org/aos/agrovoc/c_7170,
Online Access:http://agritrop.cirad.fr/530998/
http://agritrop.cirad.fr/530998/1/document_530998.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!