An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.
Main Authors: | , , , , |
---|---|
Format: | Digital revista |
Language: | English |
Published: |
Sociedade Brasileira de Ciência do Solo
2013
|
Online Access: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0100-06832013000200007 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
oai:scielo:S0100-06832013000200007 |
---|---|
record_format |
ojs |
spelling |
oai:scielo:S0100-068320130002000072013-06-03An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, BrazilCaten,Alexandre tenDalmolin,Ricardo Simão DinizPedron,Fabrício de AraújoRuiz,Luis Fernando ChimeloSilva,Carlos Antônio da decision tree pedometry soil survey mapping unit Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.info:eu-repo/semantics/openAccessSociedade Brasileira de Ciência do SoloRevista Brasileira de Ciência do Solo v.37 n.2 20132013-04-01info:eu-repo/semantics/articletext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0100-06832013000200007en10.1590/S0100-06832013000200007 |
institution |
SCIELO |
collection |
OJS |
country |
Brasil |
countrycode |
BR |
component |
Revista |
access |
En linea |
databasecode |
rev-scielo-br |
tag |
revista |
region |
America del Sur |
libraryname |
SciELO |
language |
English |
format |
Digital |
author |
Caten,Alexandre ten Dalmolin,Ricardo Simão Diniz Pedron,Fabrício de Araújo Ruiz,Luis Fernando Chimelo Silva,Carlos Antônio da |
spellingShingle |
Caten,Alexandre ten Dalmolin,Ricardo Simão Diniz Pedron,Fabrício de Araújo Ruiz,Luis Fernando Chimelo Silva,Carlos Antônio da An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil |
author_facet |
Caten,Alexandre ten Dalmolin,Ricardo Simão Diniz Pedron,Fabrício de Araújo Ruiz,Luis Fernando Chimelo Silva,Carlos Antônio da |
author_sort |
Caten,Alexandre ten |
title |
An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil |
title_short |
An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil |
title_full |
An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil |
title_fullStr |
An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil |
title_full_unstemmed |
An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil |
title_sort |
appropriate data set size for digital soil mapping in erechim, rio grande do sul, brazil |
description |
Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %. |
publisher |
Sociedade Brasileira de Ciência do Solo |
publishDate |
2013 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0100-06832013000200007 |
work_keys_str_mv |
AT catenalexandreten anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT dalmolinricardosimaodiniz anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT pedronfabriciodearaujo anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT ruizluisfernandochimelo anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT silvacarlosantonioda anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT catenalexandreten appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT dalmolinricardosimaodiniz appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT pedronfabriciodearaujo appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT ruizluisfernandochimelo appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil AT silvacarlosantonioda appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil |
_version_ |
1756385094439796736 |