Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Implementation and evaluation of a text extraction tool for adverse drug reaction information
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
2010 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [sv]

Inom ramen för Världshälsoorganisationens (WHO:s) internationella biverkningsprogram rapporterar sjukvårdspersonal och patienter misstänkta läkemedelsbiverkningar i form av spontana biverkningsrapporter som via nationella myndigheter skickas till Uppsala Monitoring Centre (UMC). Hos UMC lagras rapporterna i VigiBase, WHO:s biverkningsdatabas. Rapporterna i VigiBase analyseras med hjälp av statistiska metoder för att hitta potentiella samband mellan läkemedel och biverkningar. Funna samband utvärderas i flera steg där ett tidigt steg i utvärderingen är att studera den medicinska litteraturen för att se om sambandet redan är känt sedan tidigare (tidigare kända samband filtreras bort från fortsatt analys). Att manuellt leta efter samband mellan ett visst läkemedel och en viss biverkan är tidskrävande.

I den här studien har vi utvecklat ett verktyg för att automatiskt leta efter medicinska biverkningstermer i medicinsk litteratur och spara funna samband i ett strukturerat format. I verktyget har vi implementerat och integrerat funktionalitet för att söka efter medicinska biverkningar på olika sätt (utnyttja synonymer,ta bort ändelser på ord, ta bort ord som saknar betydelse, godtycklig ordföljd och stavfel). Verktygets prestanda har utvärderats på manuellt extraherade medicinska termer från SPC-texter (texter från läkemedels bipacksedlar) och på biverkningstexter från Martindale (medicinsk referenslitteratur för information om läkemedel och substanser) där WHO-ART- och MedDRA-terminologierna har använts som källa för biverkningstermer. Studien visar att sofistikerad textextraktion avsevärt kan förbättra identifieringen av biverkningstermer i biverkningstexter jämfört med en ordagrann extraktion.

Abstract [en]

Background: Initial review of potential safety issues related to the use of medicines involves reading and searching existing medical literature sources for known associations of drug and adverse drug reactions (ADRs), so that they can be excluded from further analysis. The task is labor demanding and time consuming.

Objective: To develop a text extraction tool to automatically identify ADR information from medical adverse effects texts. Evaluate the performance of the tool’s underlying text extraction algorithm and identify what parts of the algorithm contributed to the performance.

Method: A text extraction tool was implemented on the .NET platform with functionality for preprocessing text (removal of stop words, Porter stemming and use of synonyms) and matching medical terms using permutations of words and spelling variations (Soundex, Levenshtein distance and Longest common subsequence distance). Its performance was evaluated on both manually extracted medical terms (semi-structuredtexts) from summary of product characteristics (SPC) texts and unstructured adverse effects texts from Martindale (i.e. a medical reference for information about drugs andmedicines) using the WHO-ART and MedDRA medical term dictionaries.

Results: For the SPC data set, a verbatim match identified 72% of the SPC terms. The text extraction tool correctly matched 87% of the SPC terms while producing one false positive match using removal of stop words, Porter stemming, synonyms and permutations. The use of the full MedDRA hierarchy contributed the most to performance. Sophisticated text algorithms together contributed roughly equally to the performance. Phonetic codes (i.e. Soundex) is evidently inferior to string distance measures (i.e. Levenshtein distance and Longest common subsequence distance) for fuzzy matching in our implementation. The string distance measures increased the number of matched SPC terms, but at the expense of generating false positive matches. Results from Martindaleshow that 90% of the identified medical terms were correct. The majority of false positive matches were caused by extracting medical terms not describing ADRs.

Conclusion: Sophisticated text extraction can considerably improve the identification of ADR information from adverse effects texts compared to a verbatim extraction.

Place, publisher, year, edition, pages
2010. , p. 66
Series
UPTEC X ; 10 022
Keyword [en]
text extraction, adverse drug reactions, permutation, soundex, porter stemming, levenshtein distance, longest common subsequence distance
Identifiers
URN: urn:nbn:se:uu:diva-134063OAI: oai:DiVA.org:uu-134063DiVA, id: diva2:371530
Uppsok
Technology
Supervisors
Available from: 2010-12-06 Created: 2010-11-21 Last updated: 2010-12-06Bibliographically approved

Open Access in DiVA

fulltext(2113 kB)1073 downloads
File information
File name FULLTEXT01.pdfFile size 2113 kBChecksum SHA-512
0d198a94d9ce330abbcab5f5c87eb242bdcad7225dfe6da8c534351cf87906ca580729a7ca44cbb682f6d3e6b7af1e67e8d263ec19909070c671228e674153d5
Type fulltextMimetype application/pdf

By organisation
Biology Education Centre

Search outside of DiVA

GoogleGoogle Scholar
Total: 1073 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 703 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf