Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Word Segmentation for Classification of Text
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. Word segmentation is the problem of splitting a string of written language into its component words. The purpose of this research is to do a comparative study on different techniques of word segmentation and to identify the best technique that would aid in the extraction of keyword from the text. English was chosen as the language. Dictionary-based and Machine learning approaches were used to split the compound words. This research also aims at evaluating the quality of a word segmentation by comparing it with the segmentation of reference. Results indicated that Dictionary-based word segmentation showed better results in segmenting a compound word compared to the Machine learning segmentation when technical words were involved. Also, to improve the results for the text classification, improving the quality of the text alone is not the key

Place, publisher, year, edition, pages
2019. , p. 50
Series
IT ; 19059
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-396969OAI: oai:DiVA.org:uu-396969DiVA, id: diva2:1369551
Educational program
Master Programme in Computer Science
Supervisors
Examiners
Available from: 2019-11-12 Created: 2019-11-12 Last updated: 2019-11-12Bibliographically approved

Open Access in DiVA

fulltext(704 kB)7 downloads
File information
File name FULLTEXT01.pdfFile size 704 kBChecksum SHA-512
050ef53e16387a3883a026d5e25380268bdcb1184d64755e176374f5a0109553d66d3bf4281da0927ba336fcdca71abc11789c57539789a9af11c225080fd620
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 7 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 26 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf