Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Big Data Validation
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media, Information Systems.
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With the explosion in usage of big data, stakes are high for companies to develop workflows that translate the data into business value. Those data transformations are continuously updated and refined in order to meet the evolving business needs, and it is imperative to ensure that a new version of a workflow still produces the correct output. This study focuses on the validation of big data in a real-world scenario, and implements a validation tool that compares two databases that hold the results produced by different versions of a workflow in order to detect and prevent potential unwanted alterations, with row-based and column-based statistics being used to validate the two versions. The tool was shown to provide accurate results in test scenarios, providing leverage to companies that need to validate the outputs of the workflows. In addition, by automating this process, the risk of human error is eliminated, and it has the added benefit of improved speed compared to the more labour-intensive manual alternative. All this allows for a more agile way of performing updates on the data transformation workflows by improving on the turnaround time of the validation process.

Place, publisher, year, edition, pages
2018.
Keywords [en]
big data, data testing, data validation, data quality, big data validation process, big data validation tool
National Category
Information Systems
Identifiers
URN: urn:nbn:se:uu:diva-353850OAI: oai:DiVA.org:uu-353850DiVA, id: diva2:1219691
External cooperation
Klarna Bank AB
Subject / course
Information Systems
Educational program
Master programme in Information Systems
Presentation
2018-05-25, Ekonomikum, Kyrkogårdsgatan 10, Uppsala, 11:00 (English)
Supervisors
Examiners
Available from: 2018-06-19 Created: 2018-06-17 Last updated: 2018-06-19Bibliographically approved

Open Access in DiVA

BigDataValidation_RayaRizk(1028 kB)74 downloads
File information
File name FULLTEXT01.pdfFile size 1028 kBChecksum SHA-512
f9164355c4bd0308fa9bcebfa8b2cb90c85aee940cc212348cbfd04abf764f41d84b17a91b39dffb257f17becf8a84583c22f835ada122fa720b38fcb1d3d011
Type fulltextMimetype application/pdf

By organisation
Information Systems
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 74 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 207 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf