Automatic, high accuracy prediction of reopened bugs

Xia, Xin; Lo, David; Shihab, Emad; Wang, Xinyu; Zhou, Bo

doi:10.1007/s10515-014-0162-2

Automatic, high accuracy prediction of reopened bugs

Published: 18 September 2014

Volume 22, pages 75–109, (2015)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Xin Xia¹,
David Lo²,
Emad Shihab³,
Xinyu Wang¹ &
…
Bo Zhou¹

1133 Accesses
53 Citations
Explore all metrics

Abstract

Bug fixing is one of the most time-consuming and costly activities of the software development life cycle. In general, bugs are reported in a bug tracking system, validated by a triage team, assigned for someone to fix, and finally verified and closed. However, in some cases bugs have to be reopened. Reopened bugs increase software maintenance cost, cause rework for already busy developers and in some cases even delay the future delivery of a software release. Therefore, a few recent studies focused on studying reopened bugs. However, these prior studies did not achieve high performance (in terms of precision and recall), required manual intervention, and used very simplistic techniques when dealing with this textual data, which leads us to believe that further improvements are possible. In this paper, we propose ReopenPredictor, which is an automatic, high accuracy predictor of reopened bugs. ReopenPredictor uses a number of features, including textual features, to achieve high accuracy prediction of reopened bugs. As part of ReopenPredictor, we propose two algorithms that are used to automatically estimate various thresholds to maximize the prediction performance. To examine the benefits of ReopenPredictor, we perform experiments on three large open source projects—namely Eclipse, Apache HTTP and OpenOffice. Our results show that ReopenPredictor outperforms prior work, achieving a reopened F-measure of 0.744, 0.770, and 0.860 for Eclipse, Apache HTTP and OpenOffice, respectively. These results correspond to an improvement in the reopened F-measure of the method proposed in the prior work by Shihab et al. by 33.33, 12.57 and 3.12 % for Eclipse, Apache HTTP and OpenOffice, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Revisiting reopened bugs in open source software systems

Article 25 April 2022

Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing

Article Open access 04 May 2024

The role of bug report evolution in reliable fixing estimation

Article 20 September 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

For more details, please refer to Sect. 4.
For more details, please refer to Sect. 4.
For more details of these three classifiers, please refer to Sect. 5.1.
For more details of RPComposer, please refer to Sect. 5.2.
http://sourceforge.net/projects/wvtool/
http://www.cs.waikato.ac.nz/ml/weka/
For more description of the terms in description and comments, please refer to Sect. 6.8.3.

References

Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. Eur. Conf. Mach. Learn. 2004, 39–50 (2004)
Google Scholar
Anvik, J., Murphy, G.: Determining implementation expertise from bug reports. In: MSR (2007)
Anvik, J., Hiew, L., Murphy, G.C.: Coping with an open bug repository. In: ETX, pp. 35–39 (2005)
Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: ICSE, pp. 361–370. ACM, New York (2006)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MATH MathSciNet Google Scholar
Bettenburg, N., Just, S., Schröter, A., Weiss, C., Premraj, R., Zimmermann, T.: What makes a good bug report? In: FSE, pp. 308–318 (2008)
Bhattacharya, P., Neamtiu, I.: Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: ICSM, pp. 1–10 (2010)
Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., Panichella, S.: Multi-objective cross-project defect prediction. In: IEEE Sixth International Conference on Software Testing, Verification and Validation (ICST), 2013, pp. 252–261. IEEE (2013)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Čubranić, D.: Automatic bug triage using text categorization. In: SEKE (2004)
Francis, P., Leon, D., Minch, M.: Tree-based methods for classifying software failures. In: ISSRE, pp. 451–462 (2004)
Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: MSR, pp. 11–20 (2010)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann, San Francisco (2006)
Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pp. 34–43. ACM, New York (2007a)
Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: ASE, pp. 34–43 (2007b)
Huang, L., Ng, V., Persing, I., Geng, R., Bai, X., Tian, J.: AutoODC: Automated generation of orthogonal defect classifications. In: ASE, pp. 412–415 (2011)
Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: ESEC/FSE, pp. 111–120 (2009)
Jiang, Y., Cukic, B., Ma, Y.: Techniques for evaluating fault prediction models. Empir. Softw. Eng. 13(5), 561–595 (2008)
Article Google Scholar
Kim, S., Whitehead, E.J., Zhang, Y.: Classifying software changes: clean or buggy? IEEE Trans. Softw. Eng. 34(2), 181–196 (2008)
Article Google Scholar
Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: MSR, pp. 1–10 (2010)
Lamkanfi, A., Demeyer, S., Soetens, Q., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: CSMR, pp. 249–258 (2011)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Article Google Scholar
Matter, D., Kuhn, A., Nierstrasz, O.: Assigning bug reports using a vocabulary-based expertise model of developers. In: MSR, pp. 131–140 (2009)
McCallum, A., Nigam, K., et al.: A comparison of event models for naive bayes text classification. In: AAAI-98 Workshop, Citeseer, vol. 752, pp. 41–48 (1998)
Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: ICSM, pp. 346–355 (2008)
Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: ICSE, pp. 382–391. IEEE (2013)
Nguyen, A.T., Nguyen, T.T., Nguyen, H.A., Nguyen, T.N.: Multi-layered approach for recovering links between bug reports and fixes. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, p. 63. ACM, New York (2012)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Article Google Scholar
Panichella, A., Oliveto, R., De Lucia, A.: Cross-project defect prediction models: L’union fait la force. In: Software Evolution Week—IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014, pp. 164–173 (2014)
Peters, F., Menzies, T.: Privacy and utility for defect prediction: Experiments with morph. In: ICSE, pp. 189–199. IEEE (2012)
Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., Wang, B.: Automated support for classifying software failure reports. In: ICSE, pp. 465–475 (2003)
Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Runeson, P., Alexandersson, M., Nyholm, O.: Detection of Duplicate Defect Reports Using Natural Language Processing. In: ICSE, pp. 499–510 (2007)
Sandusky, R.J., Gasser, L., Ripoche, G.: Bug report networks: Varieties, strategies, and impacts in a f/oss development community. In: MSR, Citeseer (2004)
Shihab, E., Ihara, A., Kamei, Y., Ibrahim, W., Ohira, M., Adams, B., Hassan, A., Matsumoto, K.: Predicting re-opened bugs: A case study on the eclipse project. In: WCRE, Citeseer, pp. 249–258 (2010)
Shihab, E., Ihara, A., Kamei, Y., Ibrahim, W.M., Ohira, M., Adams, B., Hassan, A.E., ichi Matsumoto, K.: Studying re-opened bugs in open source software. In: Empirical Software Engineering, pp. 1–38 (2012)
Sun, C., Lo, D., Wang, X., Jiang, J., Khoo, S.C.: A discriminative model approach for accurate duplicate bug report retrieval. In: ICSE, pp. 45–54 (2010)
Sun, C., Lo, D., Khoo, S.C., Jiang, J.: Towards more accurate retrieval of duplicate bug reports. In: ASE, pp. 253–262 (2011)
Tamrawi, A., Nguyen, T., Al-Kofahi, J., Nguyen, T.: Fuzzy set and cache-based approach for bug triaging. In: CSMR, pp. 365–375. ACM, New York (2011)
Thung, F., Lo, D., Jiang, L.: Automatic defect categorization. In: 19th Working Conference on Reverse Engineering (WCRE), 2012, pp. 205–214. IEEE (2012)
Tian, Y., Lawall, J., Lo, D.: Identifying linux bug fixing patches. In: ICSE, pp. 386–396. IEEE (2012a)
Tian, Y., Lo, D., Sun, C.: Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: WCRE, pp. 215–224 (2012b)
Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An Approach to Detecting Duplicate Bug Reports using Natural Language and Execution Information. In: ICSE, pp. 461–470 (2008)
Wu, R., Zhang, H., Kim, S., Cheung, S.C.: Relink: recovering links between bugs and changes. In: SIGSOFT FSE, pp. 15–25 (2011)
Wurst, M.: The word vector tool user guide operator reference developer tutorial (2007)
Xia, X., Lo, D., Wang, X., Yang, X., Li, S., Sun, J.: A comparative study of supervised learning algorithms for re-opened bug prediction. In: CSMR (2013)
Zaman, S., Adams, B., Hassan, A.E.: Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories, pp. 93–102. ACM, New York (2011)
Zhang, H., Gong, L., Versteeg, S.: Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 1042–1051. IEEE (2013)
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)
Article Google Scholar
Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: ICSE, pp. 14–24. IEEE (2012)
Zimmermann, T., Nagappan, N., Guo, P., Murphy, B.: Characterizing and predicting which bugs get reopened. In: ICSE, pp. 1074–1083 (2012)

Download references

Acknowledgments

This research is sponsored in part by NSFC Program (No. 61103032) and National Key Technology R&D Program of the Ministry of Science and Technology of China (No. 2013BAH01B01). The code can be download from: https://github.com/xinxia1986/reopenBug.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
Xin Xia, Xinyu Wang & Bo Zhou
School of Information Systems, Singapore Management University, Singapore, Singapore
David Lo
Department of Computer Science and Software Engineering, Concordia University, Montreal, QC, Canada
Emad Shihab

Authors

Xin Xia
View author publications
You can also search for this author in PubMed Google Scholar
David Lo
View author publications
You can also search for this author in PubMed Google Scholar
Emad Shihab
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyu Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, X., Lo, D., Shihab, E. et al. Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22, 75–109 (2015). https://doi.org/10.1007/s10515-014-0162-2

Download citation

Received: 03 October 2013
Accepted: 25 July 2014
Published: 18 September 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10515-014-0162-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Automatic, high accuracy prediction of reopened bugs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Revisiting reopened bugs in open source software systems

Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing

The role of bug report evolution in reliable fixing estimation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic, high accuracy prediction of reopened bugs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Revisiting reopened bugs in open source software systems

Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing

The role of bug report evolution in reliable fixing estimation

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation