Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Automatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing OCR engine with a developed python script to achieve data extraction of valuable information from a digital image of a receipt. Values such as VAT, VAT%, date, total-, gross-, and net-cost; will be considered as valuable information. This is a feature that has already been implemented in existing applications. However, the company that I have done this project for are interested in creating their own version. This project is an experiment to see if it is possible to implement such an application using restricted resources. To develop a program that can extract the information mentioned above. In this paper you will be guided though the process of the development of the program. As well as indulging in the mindset, findings and the steps taken to overcome the problems encountered along the way. The program achieved a success rate of 86.6% in extracting the most valuable information: total cost, VAT% and date from a set of 53 receipts originated from 34 separate establishments.

Place, publisher, year, edition, pages
2019. , p. 40
Keywords [en]
optical character recognition, automatic text extraction, python, google cloud vision, string analysis, receipt
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:lnu:diva-88602OAI: oai:DiVA.org:lnu-88602DiVA, id: diva2:1349283
External cooperation
HRM Software AB
Subject / course
Computer Engineering
Educational program
Computer Engineering Programme, 180 credits
Supervisors
Examiners
Available from: 2019-09-09 Created: 2019-09-08 Last updated: 2019-09-09Bibliographically approved

Open Access in DiVA

fulltext(851 kB)546 downloads
File information
File name FULLTEXT01.pdfFile size 851 kBChecksum SHA-512
fba0fd737559d4254a3269c2d9a8a3fea8b6a861321b5beb9673d5ec39535e65c7e411052fda9185f1cb140f0b827cb44b2444bdb6e7d219d1a2b6e30f1a3f5b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Johansson, Elias
By organisation
Department of computer science and media technology (CM)
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 546 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1656 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf