Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial
2009 (Engelska)Ingår i: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 78, nr 12, s. E19-E26Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Background: Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. Methods: This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic deidentification trial. Results are measured with precision, recall and F-measure. Results: This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. Conclusion: Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.

Ort, förlag, år, upplaga, sidor
Elsevier, 2009. Vol. 78, nr 12, s. E19-E26
Nyckelord [en]
Medical informatics applications, Natural language processing, Medical record systems, Electronic patient records in Swedish, Protected health information, Ethical issues, Annotation
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:kth:diva-46888DOI: 10.1016/j.ijmedinf.2009.04.005ISI: 000272036200012Scopus ID: 2-s2.0-71549144430OAI: oai:DiVA.org:kth-46888DiVA, id: diva2:486832
Anmärkning
QC 20120131Tillgänglig från: 2012-01-31 Skapad: 2011-11-07 Senast uppdaterad: 2018-01-12Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Velupillai, SumithraDalianis, Hercules
I samma tidskrift
International Journal of Medical Informatics
Data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 50 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf