Change search
ReferencesLink to record
Permanent link

Direct link
Automated subject classification of textual documents in the context of Web-based hierarchical browsing
University of Bath, UK. (Library and Information Science)ORCID iD: 0000-0003-4169-4777
2011 (English)In: Knowledge organization, ISSN 0943-7444, Vol. 38, no 3, 230-244 p.Article in journal (Refereed) Published
Abstract [en]

While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriate hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.

Place, publisher, year, edition, pages
Ergon-Verlag, 2011. Vol. 38, no 3, 230-244 p.
National Category
Information Studies
Research subject
Humanities, Library and Information Science
URN: urn:nbn:se:lnu:diva-37057OAI: diva2:747709
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2016-05-25Bibliographically approved

Open Access in DiVA

fulltext(345 kB)100 downloads
File information
File name FULLTEXT01.pdfFile size 345 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Golub, Koraljka
In the same journal
Knowledge organization
Information Studies

Search outside of DiVA

GoogleGoogle Scholar
Total: 100 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 300 hits
ReferencesLink to record
Permanent link

Direct link