Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Experimental Study on ClassifierDesign and Text Feature Extraction for Short Text Classification
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Text classification is a wide research field with existing ready-to-use solutions for supervised training of text classifiers. The task of classifying short texts puts dif-ferent demands on the invoked learning system that general text classification does not. This thesis explores this challenge by experimenting on how to design the clas-sification system and what text features granted the best results. In the experimental study, a hierarchical versus a flat design was compared, along with different aspects of text features. The method consisted of training and testing on a dataset of 3.2 million samples in total. The test results were evaluated with the quality measures: precision, recall, F1-score and ROC analysis with a modification to target multi-class classification. The result of the experimental study was: 2-level hierarchical designed classifier gave better results than a flat designed classifier in 11 out of 13 occasions; integer represented terms outperformed TFIDF weighted terms of BOW features; lowercase conversion improved the classification results; bigram and tri-gram BOW features achieved better results than unigram BOW features. The results of the experimental study were used in a case study together with Thingmap, which maps natural language queries with users. The case study showed an improvement over earlier solutions of Thingmap’s system.

Place, publisher, year, edition, pages
2017. , p. 44
Series
UPTEC IT, ISSN 1401-5749 ; 17005
Keywords [en]
Natural Language Processing
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-323214OAI: oai:DiVA.org:uu-323214DiVA, id: diva2:1105415
External cooperation
Thingmap AB
Subject / course
Computer Systems Sciences
Educational program
Master of Science Programme in Information Technology Engineering
Presentation
2017-02-17, 1213, Ångströmslaboratoriet, Lägerhyddsvägen 1,, Uppsala, 11:00 (English)
Supervisors
Examiners
Available from: 2017-06-15 Created: 2017-06-03 Last updated: 2017-06-15Bibliographically approved

Open Access in DiVA

Sernheim_fulltext.pdf(3577 kB)109 downloads
File information
File name FULLTEXT01.pdfFile size 3577 kBChecksum SHA-512
8d6ec0c374f57ca6290bd5fd69ccca350578e5b42cb318b01455045871b253c6460a759b0748ccbec896e9f673d6215bd308de07b13c80eb9d6c90837c485a12
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 109 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 346 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf