Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Finding Synonyms in Medical Texts: Creating a system for automatic synonym extraction from medical texts
Linköping University, Department of Computer and Information Science.
2018 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
Abstract [en]

This thesis describes the work of creating an automatic system for identifying synonyms and semantically related words in medical texts. Before this work, as a part of the project E-care@home, medical texts have been classified as either lay or specialized by both a lay annotator and an expert annotator. The lay annotator, in this case, is a person without any medical knowledge, whereas the expert annotator has professional knowledge in medicine. Using these texts made it possible to create co-occurrences matrices from which the related words could be identified. Fifteen medical terms were chosen as system input. The Dice similarity of these words in a context window of ten words around them was calculated. As output, five candidate related terms for each medical term was returned. Only unigrams were considered. The candidate related terms were evaluated using a questionnaire, where 223 healthcare professionals rated the similarity using a scale from one to five. A Fleiss kappa test showed that the agreement among these raters was 0.28, which is a fair agreement. The evaluation further showed that there was a significant correlation between the human ratings and the relatedness score (Dice similarity). That is, words with higher Dice similarity tended to get a higher human rating. However, the Dice similarity interval in which the words got the highest average human rating was 0.35-0.39. This result means that there is much room for improving the system. Further developments of the system should remove the unigram limitation and expand the corpus the provide a more accurate and reliable result.

Place, publisher, year, edition, pages
2018. , p. 33
Keywords [en]
eHealth, distributional semantics, medical synonyms, semantic relations, word similarity
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-149643ISRN: LIU-IDA/KOGVET-G--18/011--SEOAI: oai:DiVA.org:liu-149643DiVA, id: diva2:1232948
Subject / course
Cognitive science
Supervisors
Examiners
Available from: 2018-07-17 Created: 2018-07-13 Last updated: 2018-07-17Bibliographically approved

Open Access in DiVA

fulltext(1615 kB)8 downloads
File information
File name FULLTEXT01.pdfFile size 1615 kBChecksum SHA-512
5cb19e855c5c5c6ac37460b2e9c398930c4093aca99e49d6e7dfd4879925552cb54a6a6fc55124215837047508b83ee09b9d58ac4bacd0c2068763eb088987f5
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Cederblad, Gustav
By organisation
Department of Computer and Information Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 8 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 101 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf