Change search
ReferencesLink to record
Permanent link

Direct link
Automatic Lexicon Extraction on RandomIndexing Word Spaces using Small Seed Lexica
KTH, School of Computer Science and Communication (CSC).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Automatic bilingual lexicon extraction has many applications in Natural Language Processing, but often times requires highly structured,parallel, data or extensive bilingual seed lexicas to get reasonably good performance. Random Indexing models with a small bilingual seed lexicon could be used to perform (semi-)automatic lexicon extraction using only separate monolingual data. This thesis explores, explains and evaluatessuch a method of (semi-)automatic lexicon extraction on Random Indexing models using a small lexicon. The main idea is to construct alinear transformation that aligns the vector representation of the words in the lexicon. Necessitated by the kind of transformation used, a slight modification of the cosine similarity measure is presented. By evaluating the method against a bilingual sentiment lexicon it was found that while the method worked well in a same language setting between comparable corpora, performance was greatly reduced in an interlanguage setting.In conclusion the method proposed is altogether inadequate as a method of (semi-)automatic lexicon extraction, but might be improved upon by further dimension reduction techniques or larger seed lexica.

Place, publisher, year, edition, pages
National Category
Computer Science
URN: urn:nbn:se:kth:diva-155894OAI: diva2:763248
Educational program
Master of Science - Computer Science
Available from: 2014-11-19 Created: 2014-11-14 Last updated: 2014-11-19Bibliographically approved

Open Access in DiVA

fulltext(1193 kB)94 downloads
File information
File name FULLTEXT01.pdfFile size 1193 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 94 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 206 hits
ReferencesLink to record
Permanent link

Direct link