Change search
ReferencesLink to record
Permanent link

Direct link
Automated subject classification of textual Web pages, based on a controlled vocabulary: challenges and recommendations
Lunds universitet. (Library and Information Science)ORCID iD: 0000-0003-4169-4777
2006 (English)In: New Review of Hypermedia and Multimedia, ISSN 1361-4568, E-ISSN 1740-7842, Vol. 12, no 1, 11-27 p.Article in journal (Refereed) Published
Abstract [en]

The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.

Place, publisher, year, edition, pages
2006. Vol. 12, no 1, 11-27 p.
Keyword [en]
Automated subject classification, Controlled vocabulary, Engineering Information thesaurus and classification scheme
National Category
Information Studies
Research subject
Humanities, Library and Information Science
URN: urn:nbn:se:lnu:diva-37067DOI: 10.1080/13614560600774313OAI: diva2:747743
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2015-09-30Bibliographically approved

Open Access in DiVA

Fulltext(145 kB)155 downloads
File information
File name FULLTEXT01.pdfFile size 145 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Golub, Koraljka
In the same journal
New Review of Hypermedia and Multimedia
Information Studies

Search outside of DiVA

GoogleGoogle Scholar
Total: 155 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 36 hits
ReferencesLink to record
Permanent link

Direct link