Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Toward an on-line preprocessor for Swedish
Linköping University, Department of Computer and Information Science.
2017 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesisAlternative title
Mot en on-line preprocessor för svenska (Swedish)
Abstract [en]

This bachelor thesis presents OPT (Open Parse Tool), a java program allowing for independent parsers/taggers to be run in sequence. For this thesis the existing java versions of Stagger and Maltparser has been adapted for use as modules in this program, and OPT's performance has then been compared to an existing, in use, alternative (Språkbanken's Korp Corpus Pipeline, henceforth KCP). Execution speed has been compared, and OPT's accuracy has been coarsly tested as either comparable or divergent to that of KCP. The same collection of documents containing natural text has been fed through OPT and KCP in sequence, and execution time was recorded. The tagged output of OPT and KCP was then run through SCREAM (Sjöholm, 2012) and if SCREAM produced comparable results between the two, the accuracy of OPT was considered as comparable to KCP. The results show that OPT completes its tagging and parsing of the documents in around 35 minutes, while KCP took over four hours to complete. SCREAM performed almost exactly the same using the outputs of either program, except for one case in which OPT's output gave better results than KCP's. The accuracy of OPT was thus considered comparable to KCP. The one divergent example can not fully be understood or explained in this thesis, given that the thesis considers SCREAM's internals as mostly that of a black box.

Place, publisher, year, edition, pages
2017.
Keywords [en]
Natural Language, Preprocessing, Part-of-Speech-Tagging, Dependency Parsing, Readability
National Category
Language Technology (Computational Linguistics) Human Computer Interaction
Identifiers
URN: urn:nbn:se:liu:diva-143012ISRN: LIU-IDA/KOGVET-G--17/001--SEOAI: oai:DiVA.org:liu-143012DiVA, id: diva2:1156641
Subject / course
Cognitive science
Supervisors
Examiners
Available from: 2017-11-15 Created: 2017-11-13 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(539 kB)14 downloads
File information
File name FULLTEXT01.pdfFile size 539 kBChecksum SHA-512
800da283f940377aa2afe60e18d640f330c41c1b52e296f2dd10ba0088be9b9a78dc732852c0620834180bc6fcce4f797ae0755f5c1b88e015e13c5a51f38899
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Wemmert, Oscar
By organisation
Department of Computer and Information Science
Language Technology (Computational Linguistics)Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar
Total: 14 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 64 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf