SNEToolkit: Spatial named entities disambiguation toolkit

“Can you tell me where San Jose is located?” “Uh! Do you know that there are more than 1700 locations named San Jose in the world?” The official name of a location is often not the name with which we are familiar. Spatial named entity (SNE) disambiguation is the process of identifying and assigning precise coordinates to a place name that can be identified in a text. This task is not always straightforward, especially when the place name in question is ambiguous for various reasons. In this context, we are interested in the disambiguation of spatial named entities that can be identified in a textual document on a country level. The solution that we propose is based on a set of techniques that allow us to disambiguate the spatial entity considering the context in which it is mentioned from a certain number of characteristics that are specific to it. The solution uses as input a textual document and extricates the named entities identified therein while associating them with the correct coordinates. SNE disambiguation is designed to support the process of fast exploration of spatiotemporal data analysis, most often for event tracking. The proposed approach was tested on 1360 SNEs extracted from the GeoVirus dataset. The results show that SNEToolkit outperformed the baseline, the standard Geonames geocoder, with a recall value of 0.911 against a recall value of 0.871 for the baseline. A flexible Python package is provided for end users.

Saved in:
Bibliographic Details
Main Authors: Kafando, Rodrique, Decoupes, Rémy, Roche, Mathieu, Teisseire, Maguelonne
Format: article biblioteca
Language:eng
Subjects:U10 - Informatique, mathématiques et statistiques, C30 - Documentation et information, B10 - Géographie, fouille de textes, analyse de données, données spatiales, http://aims.fao.org/aos/agrovoc/c_dca12b72, http://aims.fao.org/aos/agrovoc/c_15962, http://aims.fao.org/aos/agrovoc/c_379bbe9f,
Online Access:http://agritrop.cirad.fr/605738/
http://agritrop.cirad.fr/605738/1/Kafando_et_al_SoftwareX.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-cirad-fr-605738
record_format koha
spelling dig-cirad-fr-6057382024-01-29T04:38:57Z http://agritrop.cirad.fr/605738/ http://agritrop.cirad.fr/605738/ SNEToolkit: Spatial named entities disambiguation toolkit. Kafando Rodrique, Decoupes Rémy, Roche Mathieu, Teisseire Maguelonne. 2023. SoftwareX, 23:101480, 11 p.https://doi.org/10.1016/j.softx.2023.101480 <https://doi.org/10.1016/j.softx.2023.101480> SNEToolkit: Spatial named entities disambiguation toolkit Kafando, Rodrique Decoupes, Rémy Roche, Mathieu Teisseire, Maguelonne eng 2023 SoftwareX U10 - Informatique, mathématiques et statistiques C30 - Documentation et information B10 - Géographie fouille de textes analyse de données données spatiales http://aims.fao.org/aos/agrovoc/c_dca12b72 http://aims.fao.org/aos/agrovoc/c_15962 http://aims.fao.org/aos/agrovoc/c_379bbe9f “Can you tell me where San Jose is located?” “Uh! Do you know that there are more than 1700 locations named San Jose in the world?” The official name of a location is often not the name with which we are familiar. Spatial named entity (SNE) disambiguation is the process of identifying and assigning precise coordinates to a place name that can be identified in a text. This task is not always straightforward, especially when the place name in question is ambiguous for various reasons. In this context, we are interested in the disambiguation of spatial named entities that can be identified in a textual document on a country level. The solution that we propose is based on a set of techniques that allow us to disambiguate the spatial entity considering the context in which it is mentioned from a certain number of characteristics that are specific to it. The solution uses as input a textual document and extricates the named entities identified therein while associating them with the correct coordinates. SNE disambiguation is designed to support the process of fast exploration of spatiotemporal data analysis, most often for event tracking. The proposed approach was tested on 1360 SNEs extracted from the GeoVirus dataset. The results show that SNEToolkit outperformed the baseline, the standard Geonames geocoder, with a recall value of 0.911 against a recall value of 0.871 for the baseline. A flexible Python package is provided for end users. article info:eu-repo/semantics/article Journal Article info:eu-repo/semantics/publishedVersion http://agritrop.cirad.fr/605738/1/Kafando_et_al_SoftwareX.pdf text cc_by info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1016/j.softx.2023.101480 10.1016/j.softx.2023.101480 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.softx.2023.101480 info:eu-repo/semantics/altIdentifier/purl/https://doi.org/10.1016/j.softx.2023.101480 info:eu-repo/grantAgreement/EC/H2020/874850//(EU) MOnitoring Outbreak events for Disease surveillance in a data science context/MOOD info:eu-repo/grantAgreement/EC/H2020/ANR-20-PCPA-0002//(FRA) Building epidemiological surveillance and prophylaxis with observations both near and distant/BEYOND
institution CIRAD FR
collection DSpace
country Francia
countrycode FR
component Bibliográfico
access En linea
databasecode dig-cirad-fr
tag biblioteca
region Europa del Oeste
libraryname Biblioteca del CIRAD Francia
language eng
topic U10 - Informatique, mathématiques et statistiques
C30 - Documentation et information
B10 - Géographie
fouille de textes
analyse de données
données spatiales
http://aims.fao.org/aos/agrovoc/c_dca12b72
http://aims.fao.org/aos/agrovoc/c_15962
http://aims.fao.org/aos/agrovoc/c_379bbe9f
U10 - Informatique, mathématiques et statistiques
C30 - Documentation et information
B10 - Géographie
fouille de textes
analyse de données
données spatiales
http://aims.fao.org/aos/agrovoc/c_dca12b72
http://aims.fao.org/aos/agrovoc/c_15962
http://aims.fao.org/aos/agrovoc/c_379bbe9f
spellingShingle U10 - Informatique, mathématiques et statistiques
C30 - Documentation et information
B10 - Géographie
fouille de textes
analyse de données
données spatiales
http://aims.fao.org/aos/agrovoc/c_dca12b72
http://aims.fao.org/aos/agrovoc/c_15962
http://aims.fao.org/aos/agrovoc/c_379bbe9f
U10 - Informatique, mathématiques et statistiques
C30 - Documentation et information
B10 - Géographie
fouille de textes
analyse de données
données spatiales
http://aims.fao.org/aos/agrovoc/c_dca12b72
http://aims.fao.org/aos/agrovoc/c_15962
http://aims.fao.org/aos/agrovoc/c_379bbe9f
Kafando, Rodrique
Decoupes, Rémy
Roche, Mathieu
Teisseire, Maguelonne
SNEToolkit: Spatial named entities disambiguation toolkit
description “Can you tell me where San Jose is located?” “Uh! Do you know that there are more than 1700 locations named San Jose in the world?” The official name of a location is often not the name with which we are familiar. Spatial named entity (SNE) disambiguation is the process of identifying and assigning precise coordinates to a place name that can be identified in a text. This task is not always straightforward, especially when the place name in question is ambiguous for various reasons. In this context, we are interested in the disambiguation of spatial named entities that can be identified in a textual document on a country level. The solution that we propose is based on a set of techniques that allow us to disambiguate the spatial entity considering the context in which it is mentioned from a certain number of characteristics that are specific to it. The solution uses as input a textual document and extricates the named entities identified therein while associating them with the correct coordinates. SNE disambiguation is designed to support the process of fast exploration of spatiotemporal data analysis, most often for event tracking. The proposed approach was tested on 1360 SNEs extracted from the GeoVirus dataset. The results show that SNEToolkit outperformed the baseline, the standard Geonames geocoder, with a recall value of 0.911 against a recall value of 0.871 for the baseline. A flexible Python package is provided for end users.
format article
topic_facet U10 - Informatique, mathématiques et statistiques
C30 - Documentation et information
B10 - Géographie
fouille de textes
analyse de données
données spatiales
http://aims.fao.org/aos/agrovoc/c_dca12b72
http://aims.fao.org/aos/agrovoc/c_15962
http://aims.fao.org/aos/agrovoc/c_379bbe9f
author Kafando, Rodrique
Decoupes, Rémy
Roche, Mathieu
Teisseire, Maguelonne
author_facet Kafando, Rodrique
Decoupes, Rémy
Roche, Mathieu
Teisseire, Maguelonne
author_sort Kafando, Rodrique
title SNEToolkit: Spatial named entities disambiguation toolkit
title_short SNEToolkit: Spatial named entities disambiguation toolkit
title_full SNEToolkit: Spatial named entities disambiguation toolkit
title_fullStr SNEToolkit: Spatial named entities disambiguation toolkit
title_full_unstemmed SNEToolkit: Spatial named entities disambiguation toolkit
title_sort snetoolkit: spatial named entities disambiguation toolkit
url http://agritrop.cirad.fr/605738/
http://agritrop.cirad.fr/605738/1/Kafando_et_al_SoftwareX.pdf
work_keys_str_mv AT kafandorodrique snetoolkitspatialnamedentitiesdisambiguationtoolkit
AT decoupesremy snetoolkitspatialnamedentitiesdisambiguationtoolkit
AT rochemathieu snetoolkitspatialnamedentitiesdisambiguationtoolkit
AT teisseiremaguelonne snetoolkitspatialnamedentitiesdisambiguationtoolkit
_version_ 1792500593482268672