Could spatial features help the matching of textual data?

Textual data is available to an increasing extent through different media (social networks, companies data, data catalogues, etc.). New information extraction methods are needed since these new resources are highly heterogeneous. In this article, we propose a text matching process based on spatial features and assessed through heterogeneous textual data. Besides being compatible with heterogeneous data, it comprises two contributions: first, spatial information is extracted for comparison purposes and subsequently stored in a dedicated spatial textual representation (STR); and then two transformations are applied on STR to improve the spatial similarity estimation. This article outlines the proposed approach with new contributions: (i) a new geocoding methods using general co-occurrences between entities, and (ii) a thorough evaluation followed by (iii) an in-depth discussion. The results obtained on two corpora demonstrate that good spatial matches (≈ 80% precision on major criteria) can be obtained between the most similar STRs with further enhancement achieved via STR transformation.

Saved in:
Bibliographic Details
Main Authors: Fize, Jacques, Roche, Mathieu, Teisseire, Maguelonne
Format: article biblioteca
Language:eng
Subjects:C30 - Documentation et information, U10 - Informatique, mathématiques et statistiques, B10 - Géographie, analyse de données, données, traitement de l'information, données spatiales, fouille de textes, analyse spatiale, http://aims.fao.org/aos/agrovoc/c_15962, http://aims.fao.org/aos/agrovoc/c_49816, http://aims.fao.org/aos/agrovoc/c_3862, http://aims.fao.org/aos/agrovoc/c_379bbe9f, http://aims.fao.org/aos/agrovoc/c_dca12b72, http://aims.fao.org/aos/agrovoc/c_40da9d3b,
Online Access:http://agritrop.cirad.fr/596819/
http://agritrop.cirad.fr/596819/1/ida-24-ida194749.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Textual data is available to an increasing extent through different media (social networks, companies data, data catalogues, etc.). New information extraction methods are needed since these new resources are highly heterogeneous. In this article, we propose a text matching process based on spatial features and assessed through heterogeneous textual data. Besides being compatible with heterogeneous data, it comprises two contributions: first, spatial information is extracted for comparison purposes and subsequently stored in a dedicated spatial textual representation (STR); and then two transformations are applied on STR to improve the spatial similarity estimation. This article outlines the proposed approach with new contributions: (i) a new geocoding methods using general co-occurrences between entities, and (ii) a thorough evaluation followed by (iii) an in-depth discussion. The results obtained on two corpora demonstrate that good spatial matches (≈ 80% precision on major criteria) can be obtained between the most similar STRs with further enhancement achieved via STR transformation.