Shallow text clustering does not mean weak topics: how topic identification can leverage bigram feature

Text clustering and topic learning are two closely related tasks. In this paper, we show that the topics can be learnt without the absolute need of an exact categorization. In particular, the experiments performed on two real case studies with a vocabulary based on bigram features lead to extracting readable topics that cover most of the documents. Precision at 10 is up to 74% for a dataset of scientific abstracts with 10,000 features, which is 4% less than when using unigrams only but provides more interpretable topics.

Saved in:
Bibliographic Details
Main Authors: Velcin, Julien, Roche, Mathieu, Poncelet, Pascal
Format: conference_item biblioteca
Language:eng
Published: CEUR-WS
Subjects:C30 - Documentation et information, U30 - Méthodes de recherche,
Online Access:http://agritrop.cirad.fr/581229/
http://agritrop.cirad.fr/581229/1/DMNLP16_paper4.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!