Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards Trustworthy Survival Analysis with Machine Learning Models
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0001-9416-5647
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Survival Analysis is a major sub-field of statistics that studies the time to an event, like a patient's death or a machine's failure. This makes survival analysis crucial in critical applications like medical studies and predictive maintenance. In such applications, safety is critical creating a demand for trustworthy models. Machine learning and deep learning techniques started to be used, spurred by the growing volume of collected data. While this direction holds promise for improving certain qualities, such as model performance, it also introduces new challenges in other areas, particularly model explainability. This challenge is general in machine learning due to the black-box nature of most machine learning models, especially deep neural networks (DNN). However, survival models usually output functions rather than point estimates like regression and classification models which makes their explainability even more challenging task. 

Other challenges also exist due to the nature of time-to-event data, such as censoring. This phenomenon happens due to several reasons, most commonly due to the limited study time, resulting in a considerable number of studied subjects not experiencing the event during the study. Moreover, in industrial settings, recorded events do not always correspond to actual failures. This is because companies tend to replace machine parts before their failure due to safety or cost considerations resulting in noisy event labels. Censoring and noisy labels create a challenge in building and evaluating survival models.    

This thesis addresses these challenges by following two tracks, one focusing on explainability and the other on improving performance. The two tracks eventually merge providing an explainable survival model while maintaining the performance of its black-box counterpart.

In the explainability track, we propose two post-hoc explanation methods based on what we define as Survival Patterns. These are patterns in the predictions of the survival model that represent distinct survival behaviors in the studied population. We propose an algorithm for discovering the survival patterns upon which the two post-hoc explanation methods rely. The first method, SurvSHAP, utilizes a proxy classification model that learns the relationship between the input space and the discovered survival patterns. The proxy model is then explained using the SHAP method resulting in per-pattern explanations. The second post-hoc method relies on finding counterfactual explanations that would change the decision of the survival model from one source survival pattern to another. The algorithm uses Particle Swarm Optimization (PSO) with a tailored objective function to guarantee certain explanation qualities in plausibility and actionability.

On the performance track, we propose a Variational Encoder-Decoder model for estimating the survival function using a sampling-based approach. The model is trained using a regression-based objective function that accounts for censored instances assisted with a differentiable lower bound of the concordance index (C-index). In the same work, we propose a decomposition of the C-index where we found out that it can be expressed as a weighted harmonic average of two quantities; one quantifies the concordance among the observed event cases and the other quantifies the concordance between observed events and censored cases. The two quantities are weighted by a factor that balances the contribution of event and censored cases to the total C-index. Such decomposition uncovers hidden differences among survival models that seem equivalent based on the C-index. We also used genetic programming to search for a regression-based loss function for survival analysis with an improved concordance ability. The search results uncovered an interesting phenomenon, upon which we propose the use of the continuously differentiable Softplus function instead of the sharp-cut Relu function for handling censored cases. Lastly in the performance track, we propose an algorithm for correcting erroneous observed event labels that can be caused by preventive maintenance activities. The algorithm adopts an iterative expectation-maximization-like approach utilizing a genetic algorithm to search for better event labels that can maximize a surrogate survival model's performance.

Finally, the two tracks merge and we propose CoxSE a Cox-based deep neural network model that provides inherent explanations while maintaining the performance of its black-box counterpart. The model relies on the Self-Explaining Neural Networks (SENN) and the Cox Proportional Hazard formulation. We also propose CoxSENAM, an enhancement to the Neural Additive Model (NAM) by adopting the NAM structure along with the SENN loss function and type of output. The CoxSENAM model demonstrated better explanations than the NAM-based model with enhanced robustness to noise.

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2025. , s. 29
Serie
Halmstad University Dissertations ; 128
Nationell ämneskategori
Datavetenskap (datalogi) Systemvetenskap, informationssystem och informatik
Identifikatorer
URN: urn:nbn:se:hh:diva-55202ISBN: 978-91-89587-72-4 (digital)ISBN: 978-91-89587-73-1 (tryckt)OAI: oai:DiVA.org:hh-55202DiVA, id: diva2:1925520
Disputation
2025-01-31, S3030, Högskolan i Halmstad, Kristian IV:s väg 3, Halmstad, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-01-10 Skapad: 2025-01-08 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
Delarbeten
1. SurvSHAP: A Proxy-Based Algorithm for Explaining Survival Models with SHAP
Öppna denna publikation i ny flik eller fönster >>SurvSHAP: A Proxy-Based Algorithm for Explaining Survival Models with SHAP
2022 (Engelska)Ingår i: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) / [ed] Joshua Zhexue Huang; Yi Pan; Barbara Hammer; Muhammad Khurram Khan; Xing Xie; Laizhong Cui; Yulin He, Piscataway, NJ: IEEE, 2022Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Survival Analysis models usually output functions (survival or hazard functions) rather than point predictions like regression and classification models. This makes the explanations of such models a challenging task, especially using the Shapley values. We propose SurvSHAP, a new model-agnostic algorithm to explain survival models that predict survival curves. The algorithm is based on discovering patterns in the predicted survival curves, the output of the survival model, that would identify significantly different survival behaviors, and utilizing a proxy model and SHAP method to explain these distinct survival behaviors. Experiments on synthetic and real datasets demonstrate that the SurvSHAP is able to capture the underlying factors of the survival patterns. Moreover, SurvSHAP results on the Cox Proportional Hazard model are compared with the weights of the model to show that we provide faithful overall explanations, with more fine-grained explanations of the sub-populations. We also illustrate the wrong model and explanations learned by a Cox model when applied to heterogeneous sub-populations. We show that a non-linear machine learning survival model with SurvSHAP can better model the data and provide better explanations than linear models.

Ort, förlag, år, upplaga, sidor
Piscataway, NJ: IEEE, 2022
Nyckelord
SurvSHAP, Explainable AI, Survival Patterns, SHAP, Shapley values, Proxy Model, Survival Analysis, Machine Learning
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-49149 (URN)10.1109/DSAA54385.2022.10032392 (DOI)000967751000099 ()2-s2.0-85148538187 (Scopus ID)978-1-6654-7330-9 (ISBN)978-1-6654-7331-6 (ISBN)
Konferens
The 9th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2022), Shenzhen, China, October 13-16, 2022
Forskningsfinansiär
KK-stiftelsen
Anmärkning

Funding: This research was funded by the CHIST-ERA grant CHIST-ERA-19-XAI-012 and CAISR+ project funded by the Swedish Knowledge Foundation.

Som manuscript i avhandling/As manuscript in thesis.

Tillgänglig från: 2023-02-10 Skapad: 2023-02-10 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
2. Understanding Survival Models through Counterfactual Explanations
Öppna denna publikation i ny flik eller fönster >>Understanding Survival Models through Counterfactual Explanations
Visa övriga...
2024 (Engelska)Ingår i: Computational Science – ICCS 2024: 24th International Conference, Malaga, Spain, July 2–4, 2024, Proceedings, Part IV / [ed] Elisa Bertino; Wen Gao; Bernhard Steffen; Moti Yung, Cham: Springer Nature, 2024, s. 310-324Konferensbidrag, Publicerat paper (Övrigt vetenskapligt)
Abstract [en]

The development of black-box survival models has created a need for methods that explain their outputs, just as in the case of traditional machine learning methods. Survival models usually predict functions rather than point estimates. This special nature of their output makes it more difficult to explain their operation. We propose a method to generate plausible counterfactual explanations for survival models. The method supports two options that handle the special nature of survival models' output. One option relies on the Survival Scores, which are based on the area under the survival function, which is more suitable for proportional hazard models. The other one relies on Survival Patterns in the predictions of the survival model, which represent groups that are significantly different from the survival perspective. This guarantees an intuitive well-defined change from one risk group (Survival Pattern) to another and can handle more realistic cases where the proportional hazard assumption does not hold. The method uses a Particle Swarm Optimization algorithm to optimize a loss function to achieve four objectives: the desired change in the target, proximity to the explained example, likelihood, and the actionability of the counterfactual example. Two predictive maintenance datasets and one medical dataset are used to illustrate the results in different settings. The results show that our method produces plausible counterfactuals, which increase the understanding of black-box survival models. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Ort, förlag, år, upplaga, sidor
Cham: Springer Nature, 2024
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14835
Nyckelord
Survival Analysis, Explainable Artificial Intelligence, Survival Patterns, Counterfactual Explanations
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-52260 (URN)10.1007/978-3-031-63772-8_28 (DOI)001279326500028 ()2-s2.0-85199557114& (Scopus ID)978-3-031-63771-1 (ISBN)
Konferens
24th International Conference on Computational Science, ICCS 2024, Malaga, Spain, July 2–4, 2024
Forskningsfinansiär
KK-stiftelsen, 20200001
Anmärkning

Som manuscript i avhandling/As manuscript in thesis

Tillgänglig från: 2023-12-18 Skapad: 2023-12-18 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
3. The Concordance Index Decomposition: A Measure for a Deeper Understanding of Survival Prediction Models
Öppna denna publikation i ny flik eller fönster >>The Concordance Index Decomposition: A Measure for a Deeper Understanding of Survival Prediction Models
2024 (Engelska)Ingår i: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 148, s. 1-10, artikel-id 102781Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The Concordance Index (C-index) is a commonly used metric in Survival Analysis for evaluating the performance of a prediction model. This paper proposes a decomposition of the C-index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases. This decomposition enables a more fine-grained analysis of the strengths and weaknesses of survival prediction methods. The usefulness of this decomposition is demonstrated through benchmark comparisons against state-of-the-art and classical models, together with a new variational generative neural-network-based method (SurVED), which is also proposed in this paper. Performance is assessed using four publicly available datasets with varying levels of censoring. The analysis using the C-index decomposition and synthetic censoring shows that deep learning models utilize the observed events more effectively than other models, allowing them to keep a stable C-index in different censoring levels. In contrast, classical machine learning models deteriorate when the censoring level decreases due to their inability to improve on ranking the events versus other events. © 2024 The Author(s)

Ort, förlag, år, upplaga, sidor
Amsterdam: Elsevier, 2024
Nyckelord
Survival Analysis, Evaluation Metric, Concordance Index, Variational Encoder-Decoder
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-52259 (URN)10.1016/j.artmed.2024.102781 (DOI)001171816900001 ()38325926 (PubMedID)2-s2.0-85184733529& (Scopus ID)
Forskningsfinansiär
KK-stiftelsen, 20200001
Anmärkning

Som manuscript i avhandling/As manuscript in thesis

Tillgänglig från: 2023-12-18 Skapad: 2023-12-18 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
4. Improving Concordance Index in Regression-based Survival Analysis: Discovery of Loss Function for Neural Networks
Öppna denna publikation i ny flik eller fönster >>Improving Concordance Index in Regression-based Survival Analysis: Discovery of Loss Function for Neural Networks
Visa övriga...
2024 (Engelska)Ingår i: GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion, New York: Association for Computing Machinery (ACM), 2024, s. 1863-1869Konferensbidrag, Publicerat paper (Övrigt vetenskapligt)
Abstract [en]

In this work, we use an Evolutionary Algorithm (EA) to discover a novel Neural Network (NN) regression-based survival loss function with the aim of improving the C-index performance. Our contribution is threefold; firstly, we propose an evolutionary meta-learning algorithm SAGA$_{loss}$ for optimizing a neural-network regression-based loss function that maximizes the C-index; our algorithm consistently discovers specialized loss functions that outperform MSCE. Secondly, based on our analysis of the evolutionary search results, we highlight a non-intuitive insight that signifies the importance of the non-zero gradient for the censored cases part of the loss function, a property that is shown to be useful in improving concordance. Finally, based on this insight, we propose MSCE$_{Sp}$, a novel survival regression loss function that can be used off-the-shelf and generally performs better than the Mean Squared Error for censored cases. We performed extensive experiments on 19 benchmark datasets to validate our findings. © 2024 is held by the owner/author(s).

Ort, förlag, år, upplaga, sidor
New York: Association for Computing Machinery (ACM), 2024
Nyckelord
evolutionary meta-learning, loss function, neural networks, survival analysis, regression
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:hh:diva-52468 (URN)10.1145/3638530.3664129 (DOI)2-s2.0-85200800944& (Scopus ID)979-8-4007-0495-6 (ISBN)
Konferens
The Genetic and Evolutionary Computation Conference, Melbourne, Australia, July 14-18, 2024
Anmärkning

Som manuscript i avhandling/As manuscript in thesis

Tillgänglig från: 2024-01-24 Skapad: 2024-01-24 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
5. Discovering Premature Replacements in Predictive Maintenance Time-to-Event Data
Öppna denna publikation i ny flik eller fönster >>Discovering Premature Replacements in Predictive Maintenance Time-to-Event Data
Visa övriga...
2023 (Engelska)Ingår i: Proceedings of the Asia Pacific Conference of the PHM Society 2023 / [ed] Takehisa Yairi; Samir Khan; Seiji Tsutsumi, New York: The Prognostics and Health Management Society , 2023, Vol. 4Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Time-To-Event (TTE) modeling using survival analysis in industrial settings faces the challenge of premature replacements of machine components, which leads to bias and errors in survival prediction. Typically, TTE survival data contains information about components and if they had failed or not up to a certain time. For failed components, the time is noted, and a failure is referred to as an event. A component that has not failed is denoted as censored. In industrial settings, in contrast to medical settings, there can be considerable uncertainty in an event; a component can be replaced before it fails to prevent operation stops or because maintenance staff believe that the component is faulty. This shows up as “no fault found” in warranty studies, where a significant proportion of replaced components may appear fault-free when tested or inspected after replacement.

In this work, we propose an expectation-maximization-like method for discovering such premature replacements in survival data. The method is a two-phase iterative algorithm employing a genetic algorithm in the maximization phase to learn better event assignments on a validation set. The learned labels through iterations are accumulated and averaged to be used to initialize the following expectation phase. The assumption is that the more often the event is selected, the more likely it is to be an actual failure and not a “no fault found”.

Experiments on synthesized and simulated data show that the proposed method can correctly detect a significant percentage of premature replacement cases.

Ort, förlag, år, upplaga, sidor
New York: The Prognostics and Health Management Society, 2023
Serie
Proceedings of the Asia Pacific Conference of the PHM Society, E-ISSN 2994-7219
Nyckelord
Survival Analysis, Predictive Maintenance, Early Replacements, Genetic Algorithms
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-52105 (URN)10.36001/phmap.2023.v4i1.3609 (DOI)
Konferens
4th Asia Pacific Conference of the Prognostics and Health Management, Tokyo, Japan, September 11-14, 2023
Forskningsfinansiär
KK-stiftelsen, 20200001
Anmärkning

Som manuscript i avhandling/As manuscript in thesis.

Tillgänglig från: 2023-11-23 Skapad: 2023-11-23 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
6. CoxSE: Exploring the Potential of Self-Explaining Neural Networks with Cox Proportional Hazards Model for Survival Analysis
Öppna denna publikation i ny flik eller fönster >>CoxSE: Exploring the Potential of Self-Explaining Neural Networks with Cox Proportional Hazards Model for Survival Analysis
Visa övriga...
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

The Cox Proportional Hazards (CPH) model has long been the preferred survival model for its explainability. However, to increase its predictive power beyond its linear log-risk, it was extended to utilize deep neural networks, sacrificing its explainability. In this work, we explore the potential of self-explaining neural networks (SENN) for survival analysis. We propose a new locally explainable Cox proportional hazards model, named CoxSE, by estimating a locally-linear log-hazard function using the SENN. We also propose a modification to the Neural additive (NAM) models hybrid with SENN, named CoxSENAM, which enables the control of the stability and consistency of the generated explanations. 

Several experiments using synthetic and real datasets are presented, benchmarking CoxSE and CoxSENAM against a NAM-based model, a DeepSurv model explained with SHAP, and a linear CPH model. The results show that, unlike the NAM-based model, the SENN-based model can provide more stable and consistent explanations while maintaining the predictive power of the black-box model. The results also show that, due to their structural design, NAM-based models demonstrate better robustness to non-informative features. Among the models, the hybrid model exhibits the best robustness.

Nyckelord
Self-Explaining Neural Networks, Cox Proportional Hazards, Survival Analysis, Interpretability, XAI, Neural Additive Models
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-55201 (URN)10.48550/arXiv.2407.13849 (DOI)
Anmärkning

Som manuscript i avhandling/As manuscript in thesis

Tillgänglig från: 2025-01-08 Skapad: 2025-01-08 Senast uppdaterad: 2025-10-01Bibliografiskt granskad

Open Access i DiVA

Fulltext(1103 kB)414 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 1103 kBChecksumma SHA-512
6ae11b6b51c473edc0533da149cbe57cfb4de84578457f8407a7093d56a3ee1485e4e7e0e791d1cf9eecd11cff34d0bafc3aa54a678aa4bd6a51af2f5bb7e20c
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Alabdallah, Abdallah
Av organisationen
Akademin för informationsteknologi
Datavetenskap (datalogi)Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 415 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1583 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf