An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil

Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.

Saved in:
Bibliographic Details
Main Authors: Caten,Alexandre ten, Dalmolin,Ricardo Simão Diniz, Pedron,Fabrício de Araújo, Ruiz,Luis Fernando Chimelo, Silva,Carlos Antônio da
Format: Digital revista
Language:English
Published: Sociedade Brasileira de Ciência do Solo 2013
Online Access:http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0100-06832013000200007
Tags: Add Tag
No Tags, Be the first to tag this record!
id oai:scielo:S0100-06832013000200007
record_format ojs
spelling oai:scielo:S0100-068320130002000072013-06-03An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, BrazilCaten,Alexandre tenDalmolin,Ricardo Simão DinizPedron,Fabrício de AraújoRuiz,Luis Fernando ChimeloSilva,Carlos Antônio da decision tree pedometry soil survey mapping unit Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.info:eu-repo/semantics/openAccessSociedade Brasileira de Ciência do SoloRevista Brasileira de Ciência do Solo v.37 n.2 20132013-04-01info:eu-repo/semantics/articletext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0100-06832013000200007en10.1590/S0100-06832013000200007
institution SCIELO
collection OJS
country Brasil
countrycode BR
component Revista
access En linea
databasecode rev-scielo-br
tag revista
region America del Sur
libraryname SciELO
language English
format Digital
author Caten,Alexandre ten
Dalmolin,Ricardo Simão Diniz
Pedron,Fabrício de Araújo
Ruiz,Luis Fernando Chimelo
Silva,Carlos Antônio da
spellingShingle Caten,Alexandre ten
Dalmolin,Ricardo Simão Diniz
Pedron,Fabrício de Araújo
Ruiz,Luis Fernando Chimelo
Silva,Carlos Antônio da
An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
author_facet Caten,Alexandre ten
Dalmolin,Ricardo Simão Diniz
Pedron,Fabrício de Araújo
Ruiz,Luis Fernando Chimelo
Silva,Carlos Antônio da
author_sort Caten,Alexandre ten
title An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
title_short An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
title_full An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
title_fullStr An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
title_full_unstemmed An appropriate data set size for digital soil mapping in Erechim, Rio Grande do Sul, Brazil
title_sort appropriate data set size for digital soil mapping in erechim, rio grande do sul, brazil
description Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.
publisher Sociedade Brasileira de Ciência do Solo
publishDate 2013
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0100-06832013000200007
work_keys_str_mv AT catenalexandreten anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT dalmolinricardosimaodiniz anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT pedronfabriciodearaujo anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT ruizluisfernandochimelo anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT silvacarlosantonioda anappropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT catenalexandreten appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT dalmolinricardosimaodiniz appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT pedronfabriciodearaujo appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT ruizluisfernandochimelo appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
AT silvacarlosantonioda appropriatedatasetsizefordigitalsoilmappinginerechimriograndedosulbrazil
_version_ 1756385094439796736