Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Releasing a Swedish Clinical Corpus after Removing all Words – De-identification Experiments with Conditional Random Fields and Random Forests
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2012 (engelsk)Inngår i: Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012), 2012, 45-48 s.Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without protected health information (PHI), such as names, telephone numbers, and so on. One approach to minimizing the risk of revealing PHI when releasing text corpora from such records is to include only features of the words instead of the words themselves. Such features may include parts of speech, word length, and so on from which the sensitive information cannot be derived. In order to investigate what performance losses can be expected when replacing specific words with features, an experiment with two state-of-the-art machine learning methods, conditional random fields and random forests, is presented, comparing their ability to support de-identification, using the Stockholm EPR PHI corpus as a benchmark test. The results indicate severe performance losses when the actual words are removed, leading to the conclusion that the chosen features are not sufficient for the suggested approach to be viable.

sted, utgiver, år, opplag, sider
2012. 45-48 s.
Emneord [en]
de-identification, conditional random fields, random forests, Swedish clinical text
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-79527OAI: oai:DiVA.org:su-79527DiVA: diva2:549732
Konferanse
The Third Workshop on Building and Evaluating Resources for Biomedical Text Mining, 26th May 2012, Istanbul, Turkey
Tilgjengelig fra: 2012-09-05 Laget: 2012-09-05 Sist oppdatert: 2013-02-04bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler

Andre lenker

http://rapidlibrary.com/source.php?file=ulcvcceew8i89on&url=http%3A%2F%2Fpeople.dsv.su.se%2F%7Ehercules%2Fpapers%2FDalianis_and_Bostrom_2012_Releasing_a_Swedish_clinical_corpus_after_removing_all_words-de-identification_experiments_with_conditional_random_fields_and_random_forests.pdf&sec=e04696ee7c89c415

Søk i DiVA

Av forfatter/redaktør
Dalianis, HerculesBoström, Henrik
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

Totalt: 28 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf