Soil classification based on physical and chemical properties using random forests

Soil classification is a method of encoding the most relevant information about a given soil, namely its composition and characteristics, in a single class, to be used in areas like agriculture and forestry. In this paper, we evaluate how confidently we can predict soil classes, following the World Reference Base classification system, based on the physical and chemical characteristics of its layers. The Random Forests classifier was used with data consisting of 6 760 soil profiles composed by 19 464 horizons, collected in Mexico. Four methods of modelling the data were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness). We also fine-tuned the best parameters for the classifier and for a k-NN imputation algorithm, used for addressing problems of missing data. Under-represented classes showed significantly worse results, by being repeatedly predicted as one of the majority classes. The best method to model the data was found to be the n first layers approach, with missing values being imputed with k-NN ($$k=1$$ ). The results present a Kappa value from 0.36 to 0.48 and were in line with the state of the art methods, which mostly use remote sensing data.

Saved in:
Bibliographic Details
Main Authors: Dias, Didier, Martins, Bruno, Pires, João, de Sousa, Luís M., Estima, Jacinto, Damásio, Carlos V.
Format: Article in monograph or in proceedings biblioteca
Language:English
Published: Springer
Subjects:Ensemble learning, Machine learning, Random Forests, Soil classification, Soil properties,
Online Access:https://research.wur.nl/en/publications/soil-classification-based-on-physical-and-chemical-properties-usi
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Soil classification is a method of encoding the most relevant information about a given soil, namely its composition and characteristics, in a single class, to be used in areas like agriculture and forestry. In this paper, we evaluate how confidently we can predict soil classes, following the World Reference Base classification system, based on the physical and chemical characteristics of its layers. The Random Forests classifier was used with data consisting of 6 760 soil profiles composed by 19 464 horizons, collected in Mexico. Four methods of modelling the data were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness). We also fine-tuned the best parameters for the classifier and for a k-NN imputation algorithm, used for addressing problems of missing data. Under-represented classes showed significantly worse results, by being repeatedly predicted as one of the majority classes. The best method to model the data was found to be the n first layers approach, with missing values being imputed with k-NN ($$k=1$$ ). The results present a Kappa value from 0.36 to 0.48 and were in line with the state of the art methods, which mostly use remote sensing data.