Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hierarchical text classification of fiction books: With Thema subject categories
Linköping University, Department of Computer and Information Science, Human-Centered systems.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Categorizing books and literature of any genre and subject area is a vital task for publishers which seek to distribute their books to the appropriate audiences. It is common that different countries use different subject categorization schemes, which makes international book trading more difficult due to the need to categorize books from scratch once they reach another country. A solution to this problem has been proposed in the form of an international standard called Thema, which encompasses thousands of hierarchical subject categories. However, because this scheme is quite recent, many books published before its creation are yet to be assigned subject categories. It also is often the case that even recent books are not categorized. In this work, methods for automatic categorization of books are investigated, based on multinomial Naive Bayes and Facebook's classifier fastText. The results show some amount of promise for both classifiers, but overall, due to data imbalance and a very long training time that made it difficult to use more data, it is not possible to determine with certainty which classifier actually is best.

Place, publisher, year, edition, pages
2019. , p. 55
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-154469ISRN: LIU-IDA/LITH-EX-A--2019/002--SEOAI: oai:DiVA.org:liu-154469DiVA, id: diva2:1288590
External cooperation
Storytel AB
Subject / course
Computer science
Supervisors
Examiners
Available from: 2019-02-21 Created: 2019-02-13 Last updated: 2019-09-09Bibliographically approved

Open Access in DiVA

fulltext(680 kB)67 downloads
File information
File name FULLTEXT01.pdfFile size 680 kBChecksum SHA-512
66a1d553d8809d9229facb2fcdf4cf66530156d7e37330f2ee2b2ce823f34d0f7e40888221f5aaae7e4f370e20084dd58b92aa6d20297d168839f47f47981512
Type fulltextMimetype application/pdf

By organisation
Human-Centered systems
Computer SciencesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 67 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 386 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf