Rebuilding Trust in Black-Box Models: Using Explainable Machine Learning (SHAP) to Analyze Feature Impact Across Models for Bankruptcy Prediction
2025 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE credits
Student thesis
Abstract [en]
This thesis aims to enhance the interpretability of ensemble machine learning (ML) models for bankruptcy prediction using SHAP (SHapley Additive exPlanations). Traditional statistical models, such as logistic regression, lack the ability to capture non-linear relationships in financial data, while ensemble models like Random Forest (RF) and XGBoost (XGB) excel in predictive accuracy but are difficult to interpret. The research bridges this gap by applying SHAP to transform these black-box models into transparent systems, making them more actionable and trustworthy for financial institutions. Using a dataset of 10,696 companies from the Swedish hospitality sector (1998–2021), the study addresses class imbalance with SMOTE-ENN and evaluates Logistic Regression, RF, and XGB. Results reveal that XGB captures more complex, non-linear patterns, achieving the highest accuracy and outperforming RF and Logistic Regression. SHAP analysis identifies key financial ratios, such as retained earnings to total assets and working capital to total assets, as the most influential predictors. Results demonstrate that XGB outperforms LR and RF in predictive accuracy by capturing complex, non-linear feature interactions. SHAP analysis identified significant contributors, including features withweaker or negative correlations, particularly in RF and XGB. In contrast, LR exhibited simpler, linearrelationships, aligning more closely with traditional correlation metrics. This research underscores the valueof explainable ML in enhancing decision-making, ensuring regulatory compliance, and fostering trust inML-based bankruptcy prediction. By combining accuracy with interpretability, it provides a robust framework for analyzing high-dimensional, imbalanced datasets in financial analytics.
Place, publisher, year, edition, pages
2025.
Keywords [en]
Bankruptcy Prediction, Machine Learning, XGBoost, Random Forest, Logistic Regression, SHAP, SMOTE-ENN, Altman’s Z-Score, Financial Ratios, Explainable ML, Class Imbalance, Theoretical Ranking, Data driven distribution, Feature importance
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:du-50136OAI: oai:DiVA.org:du-50136DiVA, id: diva2:1935353
Subject / course
Microdata Analysis
2025-02-062025-02-06