Finding the best learning to rank algorithms for effort-aware defect prediction

Effort-Aware Defect Prediction (EADP) ranks software modules or changes based on their predicted number of defects (i.e., considering modules or changes as effort) or defect density (i.e., considering LOC as effort) by using learning to rank algorithms. Ranking instability refers to the inconsistent conclusions produced by existing empirical studies of EADP. The major reason is the poor experimental design, such as comparison of few learning to rank algorithms, the use of small number of datasets or datasets without indicating numbers of defects, and evaluation with inappropriate or few metrics. Objective: To find a stable ranking of learning to rank algorithms to investigate the best ones for EADP, Method: We examine the practical effects of 34 algorithms on 49 datasets for EADP. We measure the performance of these algorithms using 7 module-based and 7 LOC-based metrics and run experiments under cross-release and cross-project settings, respectively. Finally, we obtain the ranking of these algorithms by performing the Scott-Knott ESD test. Results: When module is used as effort, random forest regression performs the best under cross-release setting, and linear regression performs the best under cross-project setting among the learning to rank algorithms; (2) when LOC is used as effort, LTR-linear (Learning-to-Rank with the linear model) performs the best under cross-release setting, and Ranking SVM performs the best under cross-project setting. Conclusion: This comprehensive experimental procedure allows us to discover a stable ranking of the studied algorithms to select the best ones according to the requirement of software projects.

Saved in:
Bibliographic Details
Main Authors: Yu, Xiao, Dai, Heng, Li, Li, Gu, Xiaodong, Keung, Jacky Wai, Bennin, Kwabena Ebo, Li, Fuyang, Liu, Jin
Format: Article/Letter to editor biblioteca
Language:English
Subjects:Empirical study, Learning to rank, Ranking instability, Software defect prediction,
Online Access:https://research.wur.nl/en/publications/finding-the-best-learning-to-rank-algorithms-for-effort-aware-def
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Effort-Aware Defect Prediction (EADP) ranks software modules or changes based on their predicted number of defects (i.e., considering modules or changes as effort) or defect density (i.e., considering LOC as effort) by using learning to rank algorithms. Ranking instability refers to the inconsistent conclusions produced by existing empirical studies of EADP. The major reason is the poor experimental design, such as comparison of few learning to rank algorithms, the use of small number of datasets or datasets without indicating numbers of defects, and evaluation with inappropriate or few metrics. Objective: To find a stable ranking of learning to rank algorithms to investigate the best ones for EADP, Method: We examine the practical effects of 34 algorithms on 49 datasets for EADP. We measure the performance of these algorithms using 7 module-based and 7 LOC-based metrics and run experiments under cross-release and cross-project settings, respectively. Finally, we obtain the ranking of these algorithms by performing the Scott-Knott ESD test. Results: When module is used as effort, random forest regression performs the best under cross-release setting, and linear regression performs the best under cross-project setting among the learning to rank algorithms; (2) when LOC is used as effort, LTR-linear (Learning-to-Rank with the linear model) performs the best under cross-release setting, and Ranking SVM performs the best under cross-project setting. Conclusion: This comprehensive experimental procedure allows us to discover a stable ranking of the studied algorithms to select the best ones according to the requirement of software projects.