Change search
ReferencesLink to record
Permanent link

Direct link
The Use of Distributional Semantics in Text Classification Models: Comparative performance analysis of popular word embeddings
Linköping University, Department of Electrical Engineering, Computer Vision.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In the field of Natural Language Processing, supervised machine learning is commonly used to solve classification tasks such as sentiment analysis and text categorization. The classical way of representing the text has been to use the well known Bag-Of-Words representation. However lately low-dimensional dense word vectors have come to dominate the input to state-of-the-art models. While few studies have made a fair comparison of the models' sensibility to the text representation, this thesis tries to fill that gap. We especially seek insight in the impact various unsupervised pre-trained vectors have on the performance. In addition, we take a closer look at the Random Indexing representation and try to optimize it jointly with the classification task. The results show that while low-dimensional pre-trained representations often have computational benefits and have also reported state-of-the-art performance, they do not necessarily outperform the classical representations in all cases.

Place, publisher, year, edition, pages
2016. , 44 p.
Keyword [en]
distributional semantics, text classification, cnn
National Category
Signal Processing
URN: urn:nbn:se:liu:diva-127991ISRN: LiTH-ISY-EX--16/4926--SEOAI: diva2:928411
External cooperation
Subject / course
Computer Vision Laboratory
Available from: 2016-05-30 Created: 2016-05-15 Last updated: 2016-05-30Bibliographically approved

Open Access in DiVA

fulltext(9724 kB)34 downloads
File information
File name FULLTEXT01.pdfFile size 9724 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Computer Vision
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 34 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 112 hits
ReferencesLink to record
Permanent link

Direct link