Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
PREDICTING 100-DAY MORTALITY FROM CIRCULATORY SYSTEM DISEASES IN DIABETES MELLITUS PATIENTS VISITING THE EMERGENCY DEPARTMENT: A PATIENT-LEVEL RISK ANALYSIS
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Accurately predicting 100-day mortality caused by circulatory system diseases (ICD-10-CM Chapter IX) in diabetes mellitus patients visiting the emergency department (ED) is crucial for improving clinical decision-making, given their high mortality risk from these conditions. To identify contributing risk factors, we apply Gradient Boosting Decision Trees (GBDT) models such as XGBoost, LightGBM, and CatBoost.

This study develops and compares two predictive models. Model 1 includes real-time clinical features available during the ED visit, while Model 2 incorporates additional historical diagnosis features that are not available in real-time in the ED. 

To address class imbalance, we apply two techniques which are a weighted method using the scale_pos_weight parameter in XGBoost and LightGBM as well as auto_class_weights parameter in CatBoost, and an oversampling approach using SMOTE-NC. All hyperparameter tuning is performed using 5-fold cross-validation in order to find the optimal hyperparameters for each GBDT model.

The results indicate that among all GBDT models with balancing methods, CatBoost with class weighting performs best for both Model 1 and Model 2. Since both models show similar predictive performance, this suggests that Model 1 effectively captures the key risk factors.

SHAP (SHapley Additive exPlanations) analysis further reveals that the inclusion of additional diagnostic features in Model 2 influences individual patient predictions. Although these features do not significantly improve overall model accuracy (as indicated by Model 2 performance metrics derived from CatBoost with class weighting), they provide valuable insights into patient-specific risk factors, highlighting the potential of interpretable machine learning in personalized risk assessment.

Abstract [sv]

Att korrekt förutsäga 100-dagars mortalitet på grund av sjukdomar i cirkulationssystemet (ICD-10-CM kapitel IX) hos patienter med diabetes mellitus som besöker akutmottagningen är avgörande för att förbättra kliniska beslutsfattandet, eftersom dessa patienter har en hög mortalitetsrisk. För att identifiera bidragande riskfaktorer tillämpar vi modeller baserade på Gradient Boosting Decision Trees (GBDT), såsom XGBoost, LightGBM och CatBoost.

Denna studie utvecklar och jämför två prediktiva modeller. Modell 1 inkluderar realtidskliniska variabler som är tillgängliga under akutmottagningsbesöket, medan Modell 2 även innehåller ytterligare historiska diagnosvariabler som inte är tillgängliga i realtid på akutmottagningen.

För att hantera obalanserade klasser använder vi två metoder: en viktad metod med parametern scale_pos_weight i XGBoost och LightGBM samt auto_class_weights i CatBoost, och en översamplingsmetod med hjälp av SMOTE-NC. Hyperparametrarna optimeras med 5-faldig korsvalidering för att hitta de bästa inställningarna för varje GBDT-modell.

Resultaten visar att av alla GBDT-modeller med balanseringsmetoder presterar CatBoost med klassviktning bäst för både Modell 1 och Modell 2. Eftersom modellerna uppvisar liknande prediktiv prestanda tyder detta på att Modell 1 effektivt fångar de viktigaste riskfaktorerna.

SHAP-analys (SHapley Additive exPlanations) visar vidare att inkluderingen av ytterligare diagnosvariabler i Modell 2 påverkar individuella patientprediktioner. Även om dessa variabler inte förbättrar den övergripande modellnoggrannheten nämnvärt (enligt prestandamått från Modell 2 med CatBoost och klassviktning), tillför de värdefulla insikter om patientunika riskfaktorer, vilket understryker potentialen hos tolkbara maskininlärningsmodeller i personanpassad riskbedömning.

Place, publisher, year, edition, pages
2025. , p. 120
Keywords [en]
Machine learning, Classification, SHAP (SHapley Additive exPlanations), CatBoost, XGBoost, LightGBM, TreeExplainer, 100-day mortality, emergency department, diabetes mellitus, circulatory system diseases, data balancing, risk factors
Keywords [sv]
Maskininlärning, Klassificering, SHAP (SHapley Additive exPlanations), CatBoost, XGBoost, LightGBM, TreeExplainer, 100-dagars mortalitet, akutmottagning, diabetes mellitus, cirkulationssjukdomar, databalansering, riskfaktorer
National Category
Mathematical sciences
Identifiers
URN: urn:nbn:se:umu:diva-237300OAI: oai:DiVA.org:umu-237300DiVA, id: diva2:1950398
External cooperation
Region Dalarna
Presentation
2025-03-24, Umeå, 10:15 (English)
Supervisors
Examiners
Available from: 2025-04-08 Created: 2025-04-07 Last updated: 2025-04-11Bibliographically approved

Open Access in DiVA

fulltext(1957 kB)26 downloads
File information
File name FULLTEXT01.pdfFile size 1957 kBChecksum SHA-512
686b4b213a7c551eee23a3873fa89e00b004ce6f02781d5c0784e9767e523ee08bd1cd48f404523619aef0f77db6d6c97201685f40df65e2ee989e8118ac7c5b
Type fulltextMimetype application/pdf

By organisation
Department of Mathematics and Mathematical Statistics
Mathematical sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 26 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 500 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf