Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic: Literature Reviews

Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract, and keywords of each study. In the last years, the number of published scientific studies has grown increasing the effort to perform this sort of reviews. In this paper, we proposed strategies to detect non-papers and duplicated references in results exported by search engines, and strategies to rank the references in decreasing order of importance for an SLR, regarding the terms in the search string. These strategies are based on Information Retrieval techniques. We implemented the strategies and carried out an experimental evaluation of their applicability using two real datasets. As results, the strategy to detect non-papers presented 100 of precision and 50 of recall; the strategy to detect duplicates detected more duplicates than the manual inspection; and one of the strategies to rank relevant references presented 50 of precision and 80 of recall. Therefore, the results show that the proposed strategies can minimize the effort in the Primary Selection stage of an SLR.

Saved in:
Bibliographic Details
Main Authors: Abilio,Ramon, Morais,Flávio, Vale,Gustavo, Oliveira,Claudiane, Pereira,Denilson, Costa,Heitor
Format: Digital revista
Language:English
Published: Centro Latinoamericano de Estudios en Informática 2015
Online Access:http://www.scielo.edu.uy/scielo.php?script=sci_arttext&pid=S0717-50002015000200003
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract, and keywords of each study. In the last years, the number of published scientific studies has grown increasing the effort to perform this sort of reviews. In this paper, we proposed strategies to detect non-papers and duplicated references in results exported by search engines, and strategies to rank the references in decreasing order of importance for an SLR, regarding the terms in the search string. These strategies are based on Information Retrieval techniques. We implemented the strategies and carried out an experimental evaluation of their applicability using two real datasets. As results, the strategy to detect non-papers presented 100 of precision and 50 of recall; the strategy to detect duplicates detected more duplicates than the manual inspection; and one of the strategies to rank relevant references presented 50 of precision and 80 of recall. Therefore, the results show that the proposed strategies can minimize the effort in the Primary Selection stage of an SLR.