Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analysis of Tabula: A PDF-Table extraction tool
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

PDF is a widely used text document format used by both the private and the public sector. It is designed to create layouts of text and figures on a virtual page. Research groups often publish reports in this format including raw data in tables. The content of PDF-tables can be difficult to extract, an issue the National Food Agency often runs into. Building a PDF-interpreter from the scratch is a complex and overwhelming task but there are plenty of available PDF-Table extractors. While none meet the specific requirements of the National Food Agency the most effective tool, Tabula, is open source. By analyzing the source code an evaluation of extending Tabula can be made to possibly meet the requirements in the future. However, the lack of documentation and poor class definitions makes the source code arduous to understand. Building a new application using the same library as Tabula appears to be a more promising approach.

Place, publisher, year, edition, pages
2019. , p. 34
Series
IT ; 19012
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-395656OAI: oai:DiVA.org:uu-395656DiVA, id: diva2:1363917
Educational program
Bachelor Programme in Computer Science
Supervisors
Examiners
Available from: 2019-10-22 Created: 2019-10-22 Last updated: 2019-10-22Bibliographically approved

Open Access in DiVA

fulltext(2983 kB)14 downloads
File information
File name FULLTEXT01.pdfFile size 2983 kBChecksum SHA-512
dee29f6748ed7f7945c6dfd05fe1e8f036bb862b16ddbd53c57ed793388382bb851483bfba00acce306327daeb6c817b3abdc706bbf20edf326191aa44172dc0
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 14 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf