Roadside collection of training data for cropland mapping is viable when environmental and management gradients are surveyed

Cropland maps derived from satellite imagery have become a common source of information to estimate food production, support land use policies, and measure the environmental impacts of agriculture. Cropland classification models are typically calibrated with data collected from roadside surveys which enable the sampling of large areas at a relatively low cost. However, there is a risk of providing biased data as environmental and management gradients may not be fully captured from road networks, thereby violating the assumption of representativeness of calibration data. Despite being widely adopted, the potential biases of roadside sampling have so far not been thoroughly addressed. In this study, we looked for evidence of these biases by comparing three sampling strategies: Random sampling, Roadside sampling, and Transect sampling – a spatially constrained variant of Roadside sampling. In these three strategies, non-cropland data are randomly distributed as they can be photo-interpreted. Based on reference maps at 30 m in four study sites, we followed a Monte Carlo approach to generate multiple realizations of each sampling strategy for ten sample sizes. The effect of the sampling strategy was then assessed in terms of representativeness of the data set collected and accuracy of the resulting maps. Results showed that data sets obtained from Roadside sampling were significantly less representative than those obtained from Random sampling but the resulting maps were only marginally less accurate (2% difference). Transect sampling captured systematically less variability than Random or Roadside sampling which led to differences in accuracy as large as 15%. The effect of sample size on accuracy varied across sites but generally leveled off after reaching 3000 pixels. Augmenting the size of Transect samples improved the classification accuracy but not sufficiently to match the performance of the other sampling strategies. Finally, we found that Random and Roadside training sets with similar representativeness yield comparable accuracy. Therefore, we conclude that roadside sampling can be a viable source of training data for cropland mapping if the range of environmental and management gradients is surveyed. This underlines the importance of survey planning to identify those routes that capture most variability.

Saved in:
Bibliographic Details
Main Authors: Waldner, François, Bellemans, Nicolas, Hochman, Zvi, Newby, Terrence, de Abelleyra, Diego, Verón, Santiago R., Bartalev, Sergey, Lavreniuk, Mykola, Kussul, Nataliia, Le Maire, Guerric, Simoes, Margareth, Skakun, Sergii, Defourny, Pierre
Format: article biblioteca
Language:eng
Published: Elsevier
Subjects:U30 - Méthodes de recherche, A01 - Agriculture - Considérations générales, U10 - Informatique, mathématiques et statistiques, P01 - Conservation de la nature et ressources foncières, terre agricole, utilisation des terres, imagerie par satellite, données, échantillonnage aléatoire, planification, http://aims.fao.org/aos/agrovoc/c_2808, http://aims.fao.org/aos/agrovoc/c_4182, http://aims.fao.org/aos/agrovoc/c_36761, http://aims.fao.org/aos/agrovoc/c_49816, http://aims.fao.org/aos/agrovoc/c_24499, http://aims.fao.org/aos/agrovoc/c_5951, http://aims.fao.org/aos/agrovoc/c_15070, http://aims.fao.org/aos/agrovoc/c_870, http://aims.fao.org/aos/agrovoc/c_7252, http://aims.fao.org/aos/agrovoc/c_33240,
Online Access:http://agritrop.cirad.fr/592791/
http://agritrop.cirad.fr/592791/1/2019Waldner_IJAG_Roadside.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cropland maps derived from satellite imagery have become a common source of information to estimate food production, support land use policies, and measure the environmental impacts of agriculture. Cropland classification models are typically calibrated with data collected from roadside surveys which enable the sampling of large areas at a relatively low cost. However, there is a risk of providing biased data as environmental and management gradients may not be fully captured from road networks, thereby violating the assumption of representativeness of calibration data. Despite being widely adopted, the potential biases of roadside sampling have so far not been thoroughly addressed. In this study, we looked for evidence of these biases by comparing three sampling strategies: Random sampling, Roadside sampling, and Transect sampling – a spatially constrained variant of Roadside sampling. In these three strategies, non-cropland data are randomly distributed as they can be photo-interpreted. Based on reference maps at 30 m in four study sites, we followed a Monte Carlo approach to generate multiple realizations of each sampling strategy for ten sample sizes. The effect of the sampling strategy was then assessed in terms of representativeness of the data set collected and accuracy of the resulting maps. Results showed that data sets obtained from Roadside sampling were significantly less representative than those obtained from Random sampling but the resulting maps were only marginally less accurate (2% difference). Transect sampling captured systematically less variability than Random or Roadside sampling which led to differences in accuracy as large as 15%. The effect of sample size on accuracy varied across sites but generally leveled off after reaching 3000 pixels. Augmenting the size of Transect samples improved the classification accuracy but not sufficiently to match the performance of the other sampling strategies. Finally, we found that Random and Roadside training sets with similar representativeness yield comparable accuracy. Therefore, we conclude that roadside sampling can be a viable source of training data for cropland mapping if the range of environmental and management gradients is surveyed. This underlines the importance of survey planning to identify those routes that capture most variability.