Xart system: discovering and extracting correlated arguments of n-ary relations from text

In this paper, we present Xart system based on a hybrid method using data mining approaches and syntactic analysis to automatically discover and extract relevant information modeled as n-ary relations from text. A n-ary relation links a studied object with its features considered as several arguments. Our work focuses on extracting quantitative arguments associated with their attributes, i.e. a numerical value and a unit of measure, in order to populate a domain Ontological and Terminological Resource (OTR) with new instances. The proposed method relies on the discovery of correlated arguments in text using sequential pattern mining and the OTR. Then, those Ontological Sequential Patterns (OSP) are enriched with specific syntactic relations in order to construct Ontological Linguistic Sequential Patterns (OLSP) where the arguments are expressed according to different levels of term abstraction (term, grammatical cate- gory and concept). We have made concluding experiments on a corpus from food packaging domain where relevant data to be extracted are experimental results on packagings. We have been able to extract up to 4 correlated arguments with a F-measure from 0.6 to 0.8.

Saved in:
Bibliographic Details
Main Authors: Berrahou, Soumia Lilia, Buche, Patrice, Dibie-Barthélemy, Juliette, Roche, Mathieu
Format: conference_item biblioteca
Language:eng
Published: ACM
Subjects:C30 - Documentation et information, 000 - Autres thèmes, U10 - Informatique, mathématiques et statistiques, U30 - Méthodes de recherche,
Online Access:http://agritrop.cirad.fr/580893/
http://agritrop.cirad.fr/580893/1/WIMS16.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we present Xart system based on a hybrid method using data mining approaches and syntactic analysis to automatically discover and extract relevant information modeled as n-ary relations from text. A n-ary relation links a studied object with its features considered as several arguments. Our work focuses on extracting quantitative arguments associated with their attributes, i.e. a numerical value and a unit of measure, in order to populate a domain Ontological and Terminological Resource (OTR) with new instances. The proposed method relies on the discovery of correlated arguments in text using sequential pattern mining and the OTR. Then, those Ontological Sequential Patterns (OSP) are enriched with specific syntactic relations in order to construct Ontological Linguistic Sequential Patterns (OLSP) where the arguments are expressed according to different levels of term abstraction (term, grammatical cate- gory and concept). We have made concluding experiments on a corpus from food packaging domain where relevant data to be extracted are experimental results on packagings. We have been able to extract up to 4 correlated arguments with a F-measure from 0.6 to 0.8.