Automatic Classification of Factuality Levels: A Case Study on Swedish Diagnoses and the Impact of Local Context
2011 (English)In: The Fourth International Symposium on Languages in Biology and Medicine, Singapore, 2011Conference paper (Refereed)
Clinicians express different levels of knowledge certainty when reasoning about a patient’s status. Automatic extraction of relevant information is crucial in the clinical setting, which means that factuality levels need to be distinguished. We present an automatic classifier using Conditional Random Fields, which is trained and tested on a Swedish clinical corpus annotated for factuality levels at a diagnosis statement level: the Stockholm EPR Diagnosis-Factuality Corpus. The classifier obtains promising results (best overall results are 0.699 average F-measure using all classes, 0.762 F-measure using merged classes), using simple local context features. Preceding context is more useful than posterior, although best results are obtained using a window size of +/-4. Lower levels of certainty are more problematic than higher levels, which was also the case for the human annotators in creating the corpus. A manual error analysis shows that conjunctions and other higher-level features are common sources of errors.
Place, publisher, year, edition, pages
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-68729OAI: oai:DiVA.org:su-68729DiVA: diva2:473284
Fourth International Symposium on Languages in Biology and Medicine, LBM 2011