Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated Identification of Protected Health Information in Swedish Clinical Texts - A Comparative Study of ScandiBERT and SweDeClin-BERT Models
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2024 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

In recent years, the healthcare sector has increasingly relied on technological advancements to enhance patient care and streamline medical processes. One critical aspect of this technological evolution is the automated identification of Protected Health Information (PHI) within clinical texts. Ensuring patient privacy and compliance with healthcare regulations are paramount concerns in this era of digital healthcare. This thesis delves into the evaluation and com- parison of two Natural Language Processing (NLP) models, ScandiBERT and SweDeClin-BERT, for their effectiveness in automatically detecting PHI within Swedish clinical texts. By leveraging real electronic patient records, this study aims to address the knowledge gap surrounding the optimal approach for au- tomating clinical coding processes in the Swedish healthcare context. Through rigorous evaluation metrics such as precision, recall, F1-score, and computa- tion time, the performance of these models is analyzed across various clinical contexts. The results show that both models exhibit strong performance in identifying PHI. SweDeClin-BERT demonstrates slightly higher precision in certain categories, while ScandiBERT excels in recall for others. Notably, there was not a statistically significant di↵erence between the performance of the two models suggesting that both ScandiBERT and SweDeClin-BERT offer comparable capabilities in identifying PHI within Swedish clinical texts.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Natural Language Processing (NLP), Protected Health Information (PHI), BERT, Clinical Texts, ScandiBERT, SweDeClin-BERT, Data Privacy
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:su:diva-242654OAI: oai:DiVA.org:su-242654DiVA, id: diva2:1955545
Available from: 2025-04-30 Created: 2025-04-30

Open Access in DiVA

fulltext(1062 kB)17 downloads
File information
File name FULLTEXT01.pdfFile size 1062 kBChecksum SHA-512
44242211d5bcafef1ab6be5452599c878f01eaf8f797efea118899199e9584f0d84539d3b390066a1cacb75ef6f963bdbc40381768e96a0334d7b195f3f3f697
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Iliescu, Astrid
By organisation
Department of Computer and Systems Sciences
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 17 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 16 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf