Change search
ReferencesLink to record
Permanent link

Direct link
Protein names and how to find them
Show others and affiliations
Number of Authors: 6
2002 (English)In: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 67, 13 p.49-61 p.Article in journal (Refereed) Published
Abstract [en]

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition might be regarded a solved problem in some domains, it still poses a significant challenge in others. In this work we focus on one of the more difficult tasks, the identification of protein names in text. This task presents several interesting difficulties because of the named entities' variant structural characteristics, their sometimes unclear status as names, the lack of common standards and fixed nomenclatures, and the specifics of the texts in the molecular biology domain in which they appear. We describe how we approached these and other difficulties in the implementation of Yapex, a system for the automatic identification of protein names in text. We also evaluate Yapex under four different notions of correctness and compare its performance to that of another publicly available system for protein name recognition.

Place, publisher, year, edition, pages
Elsevier , 2002, 1. Vol. 67, 13 p.49-61 p.
Keyword [en]
Knowledge, Linguistics, Natural Language Processing, Medical Information Science, Computational Molecular Biology, Information Extraction, Protein Names
National Category
Computer and Information Science
URN: urn:nbn:se:ri:diva-21302DOI: 10.1016/S1386-5056(02)00052-7OAI: diva2:1041336
Available from: 2016-10-31 Created: 2016-10-31

Open Access in DiVA

fulltext(284 kB)4 downloads
File information
File name FULLTEXT01.pdfFile size 284 kBChecksum SHA-512
Type fulltextMimetype application/pdf
fulltext(273 kB)0 downloads
File information
File name FULLTEXT02.psFile size 273 kBChecksum SHA-512
Type fulltextMimetype application/postscript

Other links

Publisher's full texthttp
In the same journal
International Journal of Medical Informatics
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 4 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 4 hits
ReferencesLink to record
Permanent link

Direct link