In forensic autopsies, accurately estimating the postmortem interval (PMI) is crucial. Traditional methods, relying on physical parameters and police data, often lack precision, particularly after approximately two days have passed since the person's death. New methods are increasingly focusing on analyzing postmortem metabolomics in biological systems, acting as a 'fingerprint' of ongoing processes influenced by internal and external molecules. By carefully analyzing these metabolomic profiles, which span a diverse range of information from events preceding death to postmortem changes, there is potential to provide more accurate estimates of the PMI. The limitation of available real human data has hindered comprehensive investigation until recently. Large-scale metabolomic data collected by the National Board of Forensic Medicine (RMV, Rättsmedicinalverket) presents a unique opportunity for predictive analysis in forensic science, enabling innovative approaches for improving PMI estimation. However, the metabolomic data appears to be large, complex, and potentially nonlinear, making it difficult to interpret. This underscores the importance of effectively employing machine learning algorithms to manage metabolomic data for the purpose of PMI predictions, the primary focus of this project.
In this study, a dataset consisting of 4,866 human samples and 2,304 metabolites from the RMV was utilized to train a model capable of predicting the PMI. Random Forest (RF) and Artificial Neural Network (ANN) models were then employed for PMI prediction. Furthermore, feature selection and incorporating sex and age into the model were explored to improve the neural network's performance.
This master's thesis shows that ANN consistently outperforms RF in PMI estimation, achieving an R2 of 0.68 and an MAE of 1.51 days compared to RF's R2 of 0.43 and MAE of 2.0 days across the entire PMI-interval. Additionally, feature selection indicates that only 35% of total metabolites are necessary for comparable results with maintained predictive accuracy. Furthermore, Principal Component Analysis (PCA) reveals that these informative metabolites are primarily located within a specific cluster on the first and second principal components (PC), suggesting a need for further research into the biological context of these metabolites.
In conclusion, the dataset has proven valuable for predicting PMI. This indicates significant potential for employing machine learning models in PMI estimation, thereby assisting forensic pathologists in determining the time of death. Notably, the model shows promise in surpassing current methods and filling crucial gaps in the field, representing an important step towards achieving accurate PMI estimations in forensic practice. This project suggests that machine learning will play a central role in assisting with determining time since death in the future.