Big data in software engineering: A systematic literature review

Main Article Content

Selami Bagriyanik Adem Karahoca

Abstract

Purpose of Study: We investigate the big data studies using batch and/or streaming data generated in the process of software development lifecycle. All phases of application development phases are in our scope including but not limited to elicitation, requirements analysis, design, software implementation, version control management, unit / functional / regression / automated / performance / stress test, release management, application log monitoring,  application usage monitoring, user complaint management, security and compliance management and software problem management.

Methods: We use a systematic literature review methodology used in Software Engineering studies to find and analyse the related studies published from January 2010 to October 2015. We synthesize the quantitative and qualitative outputs of selected papers and report the results.

Findings and Results: In general, there are scarce studies in the literature. However there are relatively more papers regarding some areas such as Software Quality, Development, Project Management and Human Computer Interaction. However research in some fields such as Deployment, Requirements Engineering, Release Management and Mobile Applications were relatively less. 

Conclusions & Recommendations: More studies are required to identify the use cases, data attributes, measurements, platform requirements especially in the fields which are identified as having lack of study.  A holistic big data perspective is needed to support software engineering ecosystems in large and complex enterprises.

 

Keywords: Big Data, Software Engineering, Software Analytics, Data Mining, Software Development, Operational Intelligence, Software Archaeology

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

[1] C.L. Philip Chen, C.-Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on Big Data, Inf. Sci. (Ny). 275 (2014) 314–347. doi:10.1016/j.ins.2014.01.015.

[2] M. Chen, S. Mao, Y. Liu, Big data: A survey, Mob. Networks Appl. 19 (2014) 171–209. doi:10.1007/s11036-013-0489-0.

[3] S. Yin, O. Kaynak, Big Data for Modern Industry : Challenges and Trends, Proc. IEEE. 103 (2015) 143–146. doi:10.1109/JPROC.2015.2388958.

[4] Wired.com, Google Is 2 Billion Lines of Code—And It’s All in One Place, (2015). http://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/ (accessed December 12, 2015).

[5] M. Salmanoğlu, K. Öztürk, S. Bağrıyanık, E. Ungan, Benefits and Challenges of Measuring Software Size : Early Results in a Large Organization, in: IWSM Mensura, 2015.

[6] R. Robbes, R. Vidal, M.C. Bastarrica, Are Software Analytics Efforts Worthwhile for Small Companies ? The Case of Amisoft, IEEE Softw. SEPTEMBER/ (2013) 46–53.

[7] B. Kitchenham, S. Charters, Guidelines for performing Systematic Literature Reviews in Software Engineering, Tech. Rep. (2007).

[8] M. Khabsa, C.L. Giles, The Number of Scholarly Documents on the Public Web, (2014). doi:10.1371/journal.pone.0093949.

[9] Aa. Tay, 8 surprising things I learnt about Google Scholar, (2014). http://musingsaboutlibrarianship.blogspot.com.tr/2014/06/8-surprising-things-i-learnt-about.html#.VoeMGvmqqN3 (accessed November 30, 2015).

[10] R. Malhotra, A. Jain, Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality, J. Inf. Process. Syst. 8 (2012) 241–262.

[11] D.H. (Polo) Chau, Data Mining Meets HCI: Making Sense of Large Graphs, Carnegie Mellon University, 2012.

[12] A. Telea, L. Voinea, Visual software analytics for the build optimization of large-scale software systems, Comput. Stat. 26 (2011) 635–654. doi:10.1007/s00180-011-0248-2.

[13] R.P.L. Buse, T. Zimmermann, Information Needs for Software Development Analytics, in: 34th Int. Conf. Softw. Eng., 2012: pp. 987–996.

[14] A. González-torres, F.J. García-peñalvo, R. Therón-sánchez, R. Colomo-palacios, Science of Computer Programming Knowledge discovery in software teams by means of evolutionary visual software analytics, Sci. Comput. Program. 1 (2015) 1–20. doi:10.1016/j.scico.2015.09.005.

[15] J. Kätevä, P. Laurinen, T. Rautio, J. Suutala, L. Tuovinen, DBSA - A Device-Based Software Architecture for Data Mining, in: 2010 ACM Symp. Appl. Comput., 2010: pp. 2273–2280.

[16] M. Wermelinger, Y. Yu, Some Issues in the “ Archaeology ” of Software Evolution, Gener. Transform. Tech. Softw. Eng. (2011) 426–445.

[17] F. Fotrousi, Analytics-based Software Product Planning, Blekinge Institute of Technology, 2013.

[18] A. Begel, T. Zimmermann, Analyze this! 145 questions for data scientists in software engineering, Proc. 36th Int. Conf. Softw. Eng. - ICSE 2014. (2014) 12–23. doi:10.1145/2568225.2568233.

[19] K.M. Anderson, Embrace the Challenges : Software Engineering in a Big Data World, in: First Int. Work. BIG Data Softw. Eng., IEEE Press, 2015: pp. 19–25. doi:10.1109/BIGDSE.2015.12.

[20] X. Fern, C. Komireddy, V. Grigoreanu, Mining Problem-Solving Strategies from HCI Data, ACM Trans. Comput. -Human Interact. 17 (2010). doi:10.1145/1721831.1721834.

[21] E.A. El-sebakhy, Expert Systems with Applications Functional networks as a novel data mining paradigm in forecasting software development efforts, Expert Syst. Appl. 38 (2011) 2187–2194. doi:10.1016/j.eswa.2010.08.005.

[22] R. Hewett, Mining software defect data to support software testing, Appl. Intell. 34 (2011) 245–257. doi:10.1007/s10489-009-0193-8.

[23] H. Tribus, I. Morrigl, S. Axelsson, Using Data Mining for Static Code Analysis of C, Adv. Data Min. Appl. (2012) 603–614.

[24] M. Bruntink, Science of Computer Programming Towards base rates in software analytics Early results and challenges from studying Ohloh, Sci. Comput. Program. 97 (2015) 135–142. doi:10.1016/j.scico.2013.11.023.

[25] C. Gupta, K. Viswanathan, L. Choudur, R. Vennelakanti, P. Helm, A. Dev, et al., Better Drilling Through Sensor Analytics : A Case Study in Live Operational Intelligence, in: Fifth Int. Work. Knowl. Discov. from Sens. Data, ACM, 2011: pp. 8–15.

[26] R. Souza, C. Chavez, R.A. Bittencourt, Rapid Releases and Patch Backouts : A Software Analytics Approach Code Integration at Mozilla about Rapid Releases at Mozilla, IEEE Softw. 32 (2015) 89–96.

[27] T. Cerqueus, E.C. De Almeida, S. Scherzinger, Safely Managing Data Variety in Big Data Software Development, in: BIGDSE 2015, IEEE/ACM, 2015. doi:10.1109/BIGDSE.2015.9.

[28] R. Heimgärtner, H. Kindermann, Revealing Cultural Influences in Human Computer Interaction by Analyzing Big Data in Interactions, Act. Media Technol. (2012) 572–583.

[29] F.A. Batarseh, A.J. Gonzalez, Predicting failures in agile software development through data analytics, Softw. Qual. J. (2015). doi:10.1007/s11219-015-9285-3.

[30] M. Beller, G. Gousios, A. Zaidman, How (Much) Do Developers Test?, in: 2015 IEEE/ACM 37th IEEE Int. Conf. Softw. Eng., 2015: pp. 559–562. doi:10.1109/ICSE.2015.193.

[31] S. Banerjee, B. Cukic, On the cost of mining very large open source repositories, in: First Int. Work. BIG Data Softw. Eng., IEEE Press, 2015: pp. 37–43. doi:10.1109/BIGDSE.2015.16.

[32] W.D. Sunindyo, T. Moser, T. Wien, D. Dhungana, Improving Open Source Software Process Quality Based on Defect Data Mining, in: SWQD, 2012: pp. 84–102. doi:10.1007/978-3-642-27213-4.

[33] M. Gayathri, A. Sudha, Software Defect Prediction System using Multilayer Perceptron Neural Network with Data Mining, Int. J. Recent Technol. Eng. 3 (2014) 54–59.

[34] A. Gonzalez-Torrez, R. Theron, F.J. Garcia-Penalvo, M. Wermellinger, Y. Yu, Maleku : an evolutionary visual software analytics tool for providing insights into software evolution Conference Item Maleku : an evolutionary visual software analytics tool for providing insights into software evolution, in: Softw. Maint. (ICSM), 2011 27th IEEE Int. Conf., IEEE, 2011.

[35] C. Rosen, B. Grawi, E. Shibab, Commit Guru : Analytics and Risk Prediction of Software Commits, in: 10th Jt. Meet. Found. Softw. Eng., ACM, 2015: pp. 966–969.

[36] C. Li, L. Huang, L. Chen, Breeze graph grammar : a graph grammar approach for modeling the software architecture of big data-oriented software systems, Softw. Pract. Exp. (2015) 1023–1050. doi:10.1002/spe.

[37] R. Liu, Q. Li, L. Mei, J. Lee, Big Data Architecture for IT Incident Management, in: 2014 IEEE Int. Conf. Serv. Oper. Logist. Informatics (SOLI), IEEE, 2014: pp. 424–429.

[38] A. Bovenzi, F. Brancati, S. Russo, A. Bondavalli, A Statistical Anomaly-Based Algorithm for On-line Fault Detection in Complex Software Critical Systems, Comput. Safety, Reliab. Secur. (2011) 128–142.

[39] A.T. Misirli, B. Caglayan, A. Bener, B. Turhan, A Retrospective Study of Software Analytics Projects : In-Depth Interviews with Practitioners, IEEE Softw. 30 (2013) 54–61.

[40] C. Lopez-martin, A. Chavoya, M.E. Meda-campaña, A Machine Learning Technique for Predicting the Productivity of Practitioners From Individually Developed 6 Software Projects, in: 2014 15th IEEE/ACIS Int. Conf. Softw. Eng. Artif. Intell. Netw. Parallel/Distributed Comput., IEEE, 2014: pp. 1–6.

[41] H. Chen, R. Kazman, S. Haziyev, O. Hrytsay, Big Data System Development : An Embedded Case Study with a Global Outsourcing Firm, in: First Int. Work. BIG Data Softw. Eng., IEEE Press, 2015: pp. 44–50. doi:10.1109/BIGDSE.2015.15.

Most read articles by the same author(s)

1 2 > >>