Change search
ReferencesLink to record
Permanent link

Direct link
Automated subject classification of textual web documents
Lund University, Sweden. (Library and Information Science)ORCID iD: 0000-0003-4169-4777
2006 (English)In: Journal of Documentation, ISSN 0022-0418, E-ISSN 1758-7379, Vol. 62, no 3, 350-371 p.Article in journal (Refereed) Published
Abstract [en]

Purpose– To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach– A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings– Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications– The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications– As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value– To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Place, publisher, year, edition, pages
Emerald Group Publishing Limited, 2006. Vol. 62, no 3, 350-371 p.
Keyword [en]
Automation, Classification, Internet, Document management, Controlled languages
National Category
Information Studies
Research subject
Humanities, Library and Information Science
URN: urn:nbn:se:lnu:diva-37069DOI: 10.1108/00220410610666501OAI: diva2:747753
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2015-10-01Bibliographically approved

Open Access in DiVA

fulltext(255 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 255 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Golub, Koraljka
In the same journal
Journal of Documentation
Information Studies

Search outside of DiVA

GoogleGoogle Scholar
Total: 39 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 30 hits
ReferencesLink to record
Permanent link

Direct link