Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Distributionella representationer av ord för effektiv informationssökning: Algoritmer för sökning i kundsupportforum
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2017 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Distributional Representations of Words for Effective Information Retrieval : Information Retrieval in Customer Support Forums (English)
Abstract [sv]

I takt med att informationsmängden ökar i samhället ställs högre krav på mer förfinade metoder för sökning och hantering av information. Att utvinna relevant data från företagsinterna system blir en mer komplex uppgift då större informationsmängder måste hanteras och mycket kommunikation förflyttas till digitala plattformar. Metoder för vektorbaserad ordinbäddning har under senare år gjort stora framsteg; i synnerhet visade Google 2013 banbrytande resultat med modellen Word2vec och överträffade äldre metoder. Vi implementerar en sökmotor som utnyttjar ordinbäddningar baserade på Word2vec och liknande modeller, avsedd att användas på IT-företaget Kundo och för produkten Kundo Forum. Resultaten visar på potential för informationssökning med markant bättre täckning utan minskad precision. Kopplat till huvudområdet informationssökning genomförs också en analys av vilka implikationer en förbättrad sökmotor har ur ett marknads- och produktutvecklingsperspektiv.

Abstract [en]

As the abundance of information in society increases, so does the need for more sophisticated methods of information retrieval. Extracting information from internal systems becomes a more complex task when handling larger amounts of information and when more communications are transferred to digital platforms. Recent years methods for word embedding in vector space have gained traction. In 2013 Google sent ripples across the field of Natural Language Processing with a new method called Word2vec, significantly outperforming former practices. Among different established methods for information retrieval, we implement a retrieval method utilizing Word2vec and related methods of word embedding for the search engine at IT company Kundo and their product Kundo Forum. We demonstrate the potential to improve information retrieval recall by a significant margin without diminishing precision. Coupled with the primary subject of information retrieval we also investigate potential market and product development implications related to a different kind of search engine.

Place, publisher, year, edition, pages
2017.
Keywords [en]
word2vec, fasttext, glove, LSI, LSA, word embeddings, information retrieval, search engine, machine learning, neural networks, natural language processing, NLP, distributional representations
Keywords [sv]
word2vec, fasttext, glove, LSI, LSA, ordinbäddning, informationssökning, sökmotor, maskininlärning, språkteknologi, neurala nätverk, distributionella representationer
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-209695OAI: oai:DiVA.org:kth-209695DiVA, id: diva2:1113895
External cooperation
Kundo AB
Educational program
Master of Science in Engineering - Industrial Engineering and Management
Supervisors
Examiners
Available from: 2017-10-13 Created: 2017-06-22 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(1739 kB)99 downloads
File information
File name FULLTEXT01.pdfFile size 1739 kBChecksum SHA-512
5b8bd5e550e7e038ce2ff97564a0dccf1aa73a41e3d64892a34bbe52e0f3dfa53657726ff8776c689d784ff416e5ea87aa2b08a592564a22a34eccccbb66d242
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 99 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 487 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf