Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Spelling Correction To Improve Classification Of Technical Error Reports
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This master’s thesis project undertook the investigation of whether spelling correction would improve the performance of the classification of reports. The idea is to use different approaches of spelling correction to check which approach suits this particular dataset. Three different approaches were tested for spelling correction. The first two approaches considered only the erroneous word for correction. The third approach also considered context or the surrounding words to the erroneous word. The results after spelling correction were tested on a model classifier. No significant improvement in the performance of the classifier was observed when compared to the baseline. The reason for this might be because most of the reports do not contain more than a few spelling errors and the majority of words detected as spelling errors are not in English. However, the second approach performed better than the baseline for the dataset due to it being language independent as most of the non-words were non-english words which are dynamically updated based on input.

Abstract [sv]

Det här examensarbetet undersökte huruvida stavningskontroll kan förbättra klassificering av rapporter. Tanken är att använda olika tillvägagångssätt för stavningskontroll för att finna det sätt som fungerar bäst på den här specifika datamängden. Tre olika tillvägagångssätt för stavningskontroll undersöktes. De två första tog bara hänsyn till enskilda felstavade ord. Det tredje sättet tog även hänsyn till det felstavade ordets kontext. Resultatet från stavningskontrollen testades på en klassificerare. Klassificeraren uppvisade inte någon signifikant förbättring vid jämförelse med en baslinje. Anledningen till detta kan vara att de flesta av rapporterna inte innehåller mer än några få stavfel och de flesta ord som upptäckts som stavfel är inte på engelska. Det andra tillvägagångssättet presterade dock bättre än baslinjen för datasetet tack vara att det var språkoberoende, eftersom de flesta av icke-orden var icke-engelska ord som dynamiskt uppdaterades baserat på input.

Place, publisher, year, edition, pages
2019. , p. 27
Series
TRITA-EECS-EX ; 2019:545
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-263112OAI: oai:DiVA.org:kth-263112DiVA, id: diva2:1366510
External cooperation
Scania
Supervisors
Examiners
Available from: 2019-11-18 Created: 2019-10-29 Last updated: 2022-06-26Bibliographically approved

Open Access in DiVA

fulltext(636 kB)3234 downloads
File information
File name FULLTEXT01.pdfFile size 636 kBChecksum SHA-512
3c2710159ae93ad50aa55e34024c3aadfda58fc17f4632efbeaa0265c5d85edc1873dc9f281618da8bc70d94d9ebeb449f92d2cd6677757a3ad5e4bf98abc20e
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3234 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 613 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf