Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Test av OCR-verktyg för Linux
Linnaeus University, Faculty of Science and Engineering, School of Computer Science, Physics and Mathematics.
2010 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
OCR software tests for Linux (English)
Abstract [sv]

Denna rapport handlar om att ta fram ett OCR-verktyg för digitalisering av pappersdokument. Krav på detta verktyg är att bland annat det ska vara kompatibelt med Linux, det ska kunna ta kommandon via kommandoprompt och dessutom ska det kunna hantera skandinaviska tecken.

Tolv OCR-verktyg granskades, sedan valdes tre verktyg ut; Ocrad, Tesseract och OCR Shop XTR. För att testa dessa scannades två dokument in och digitaliserades i varje verktyg.

Resultatet av testerna är att Tesseract är de verktyget som är mest precist och Ocrad är det verktyget som är snabbast. OCR Shop XTR visar på sämst resultat både i tidtagning och i antal korrekta ord.

Abstract [en]

This report is about finding OCR software for digitizing paper documents. Requirements were to include those which were compatible with Linux, being able to run commands via the command line and also being able to handle the Scandinavian characters.

Twelve OCR softwares were reviewed, and three softwares were chosen; Ocrad, Tesseract and OCR Shop XTR. To test these, two document were scanned and digitized in each tool.

The results of the tests are that Tesseract is the tool which is the most precise and Ocrad is the tool which is the fastest. OCR Shop XTR shows the worst results both in timing and number of correct words.

Place, publisher, year, edition, pages
2010. , p. 41
Keywords [en]
OCR, Linux, digitizing, Tesseract, Ocrad, OCR Shop XTR
Keywords [sv]
OCR, Linux, digitalisering, Tesseract, Ocrad, OCR Shop XTR
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-5906OAI: oai:DiVA.org:lnu-5906DiVA, id: diva2:322640
Presentation
2010-06-03, Kalmar, 09:00 (Swedish)
Uppsok
Technology
Supervisors
Examiners
Available from: 2010-06-08 Created: 2010-06-07 Last updated: 2018-01-12Bibliographically approved

Open Access in DiVA

fulltext(2262 kB)240 downloads
File information
File name FULLTEXT01.pdfFile size 2262 kBChecksum SHA-512
0d2e100bba1b1243c62c9e4123b2d7a513927d872cd956c859628a2331d47eb1f3c66caf3668823ca4ea611a79d7d3aa531ea7100bafa732b41c77d817d401fa
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, Elin
By organisation
School of Computer Science, Physics and Mathematics
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 240 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 262 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf