Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated subject classification of textual web documents
Lund University, Sweden. (Library and Information Science)ORCID iD: 0000-0003-4169-4777
2006 (English)In: Journal of Documentation, ISSN 0022-0418, E-ISSN 1758-7379, Vol. 62, no 3, 350-371 p.Article in journal (Refereed) Published
Abstract [en]

Purpose– To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach– A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings– Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications– The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications– As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value– To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Place, publisher, year, edition, pages
Emerald Group Publishing Limited, 2006. Vol. 62, no 3, 350-371 p.
Keyword [en]
Automation, Classification, Internet, Document management, Controlled languages
National Category
Information Studies
Research subject
Humanities, Library and Information Science
Identifiers
URN: urn:nbn:se:lnu:diva-37069DOI: 10.1108/00220410610666501OAI: oai:DiVA.org:lnu-37069DiVA: diva2:747753
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2017-12-05Bibliographically approved

Open Access in DiVA

fulltext(255 kB)99 downloads
File information
File name FULLTEXT01.pdfFile size 255 kBChecksum SHA-512
7490c49e8934f7c0860786df186498f94722045104d486eb0afa542ec949c4df19e6d1203e0f9bb668dad68af78acaaaa5ef24d11b9dbc633611e763c3360f55
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Golub, Koraljka
In the same journal
Journal of Documentation
Information Studies

Search outside of DiVA

GoogleGoogle Scholar
Total: 99 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 203 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf