Leveraging Big Data Analytics for Efficient Bug Localization in Large- Scale Software Projects

P. Naga Kavitha,.K. Rajeswari,Lydia Marina, M.Maria Lavanya, S.Neha,Pooja Mahajan

doi:10.48165/bapas.2024.44.2.1

PDF

Published: Nov 3, 2024

DOI: https://doi.org/10.48165/bapas.2024.44.2.1

Keywords:

Bug Localization; Random Forest; Text Mining; Software Engineering; Machine Learning; Natural Language Processing; Feature Engineering; Data Analytics; Software Repositories; Model Evaluation

P. Naga Kavitha,.K. Rajeswari,Lydia Marina, M.Maria Lavanya, S.Neha,Pooja Mahajan

Abstract

The exponential growth of software complexity has necessitated more sophisticated techniques for bug localization, which is crucial for maintaining the reliability and efficiency of large-scale software projects. This research paper introduces a novel hybrid approach that synergistically combines Random Forest algorithms with text mining techniques to enhance the accuracy and efficiency of bug localization. By leveraging the strengths of machine learning and natural language processing, our methodology effectively processes and analyzes both structured and unstructured data from software repositories. We present a detailed mathematical framework outlining the integration of these techniques and evaluate the model's performance using precision, recall, and F1- score metrics. Our results demonstrate that while the model shows high precision in identifying non-bug instances, it struggles with accurately detecting bug instances, indicating a need for further refinement of the feature set. The visual analysis of the simulated data highlights the nuanced relationship between code changes and bug occurrences, suggesting that additional context-aware features may improve model performance. The findings emphasize the potential of combining various data analytics techniques for bug localization and point toward future enhancements in feature engineering and model optimization.

Issue

Vol. 44 No. 3 (2024): LIB PRO. 44(3), JUL-DEC 2024 (Published: 31-07-2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details