Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Document and Image Classification withTopic Ngram Model
KTH, School of Computer Science and Communication (CSC).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Latent Dirichlet Allocation (LDA) is a popular probabilistic model for information retrieval. Many extended models based on LDA have been introduced during the past 10 years. In LDA, a data point is represented as a bag (multiset)of words. In the text case, a word is a regular text word, but other types of data can also be represented as words (e.g. visual words). Due to the bag-of-words assumption, the original LDA neglects the structure of thedata, i.e., all the relationships between words, which leads to information loss. As a matter of fact, the spatial relationship is important and useful. In order to explore the importance of the relationship, we focus on an extensionof LDA called Topic Ngram Model, which models the relationship among adjacent words. In this thesis, we first implement the model and use it in for text classification. Furthermore, we propose a 2D extension, which enables us to model spatial relationships of features in images.

Place, publisher, year, edition, pages
2014.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-155771OAI: oai:DiVA.org:kth-155771DiVA: diva2:762932
Examiners
Available from: 2014-11-20 Created: 2014-11-13 Last updated: 2014-11-20Bibliographically approved

Open Access in DiVA

fulltext(1523 kB)11834 downloads
File information
File name FULLTEXT01.pdfFile size 1523 kBChecksum SHA-512
f5cfd6c6068553d8936cc13fa5cb4231640218e33389e8e037b4c909d13503feef46af85f33907d057d8fad53001c3f7b0f03161cfeeb097736c05c048a44a08
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 11834 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 173 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf