Prediction of Cognitive Strain During the Development Phase of the Software Development Life Cycle Using Machine Learning Models
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background: Cognitive strain among software developers during the coding phase of the Software Development Life Cycle (SDLC) can adversely affect productivity and well-being. Understanding the factors contributing to cognitive strain and developing effective strategies to manage it are crucial for optimizing performance in software development.
Objectives: This study aims to identify the key factors influencing cognitive strain during the coding phase of the SDLC, quantify these factors for effective analysis, It also seeks to develop and evaluate machine learning models to predict cognitive strain, thereby exploring how identification and management can improve productivity and developer well-being.
Methods: A systematic approach was employed, beginning with a theoretical foundation established through a literature review, followed by empirical data collection via a structured questionnaire. Insights from experienced developers refined the identified factors, which were then used to develop predictive machine learning models, including Random Forest, LSTM neural networks, Logistic Regression, and K-Nearest Neighbors. These models assessed cognitive strain and informed the formulation of evidence-based management strategies, although their practical implementation was beyond the study’s scope. By quantifying cognitive strain and leveraging predictive analytics, this research provides a structured methodology for identifying, analyzing,and mitigating cognitive strain, ultimately contributing to a more sustainable and productive software development process.
Results: Significant correlations were found between cognitive strain and factors like high task complexity, extended work hours, poor sleep quality, frequent multitasking, and high deadline pressure. The Random Forest model achieved the highest performance with accuracy close to 0.991, indicating excellent predictive capabilities. LSTM performed moderately well with an accuracy of 0.808, while LR and KNN had lower accuracies around 0.62. Based on these findings, strategies such as workload balancing, expertise-based task allocation, flexible scheduling, reducing multitasking,and providing stress management resources were proposed.
Conclusions: Identifying cognitive strain through predictive modeling enables organizations to implement targeted interventions that enhance productivity and developer well-being during the coding phase of the SDLC. The Random Forest model proved particularly effective in predicting cognitive strain. The proposed strategies, supported by empirical data and existing literature, offer actionable insights for proactively addressing cognitive strain. Future research should focus on testing these interventions in practical settings, expanding the dataset, and exploring additional factors influencing cognitive strain.
Place, publisher, year, edition, pages
2025. , p. 67
Keywords [en]
Cognitive Strain, Software Development Life Cycle, Machine Learning, Productivity, Developer Well-being
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-27491OAI: oai:DiVA.org:bth-27491DiVA, id: diva2:1941135
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAADA Master Qualification Plan in Software Engineering 120,0 hp
Examiners
2025-03-032025-02-272025-03-03Bibliographically approved