Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scheduling workflows to optimize for execution time
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media.
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Many functions in today’s society are immensely dependent on data. Data drives everything from business decisions to self-driving cars to intelligent home assistants like Amazon Echo and Google Home. To make good decisions based on data, of which exabytes are generated every day, somehow that data has to be processed. Data processing can be complex and time-consuming. One way of reducing the complexity is to create workflows that consist of several steps that together produce the right result.

Klarna is an example of a company that relies on workflows for transforming and analyzing data. As a company whose core business involves analyzing customer data, being able to do those analyses faster will lead to direct business value in the form of more well-informed decisions. The workflows Klarna use are currently all written in a sequential form. However, workflows, where independent tasks are executed in parallel, are more performant than workflows where only one task is executed at any point in time. Due to limitations in human attention span, parallelized workflows are harder for humans to write, compared to sequential workflows.

In this work, a computer application was created that automates the parallelization of a workflow to let humans write sequential workflows while still getting the performance of parallelized workflows. The application does this by taking a simple sequential workflow, identifies dependencies in the workflow and then schedules it in a way that is as parallel as possible given the identified dependencies. Such a solution has not been created before. However, experimental evaluation shows that parallelization of a sequential workflow used in daily production at Klarna can reduce execution time by up to 80%, showing that the application can bring value to Klarna and other organizations that use workflows to analyze big data.

Place, publisher, year, edition, pages
2018.
Keywords [en]
Hadoop, Hive, big data, workflows, scheduling, parallelization, automatic dependency identification, dependency graph, SQL, HiveQL
National Category
Information Systems
Identifiers
URN: urn:nbn:se:uu:diva-354464OAI: oai:DiVA.org:uu-354464DiVA, id: diva2:1221353
Subject / course
Information Systems
Educational program
Master programme in Information Systems
Supervisors
Examiners
Available from: 2018-06-20 Created: 2018-06-19 Last updated: 2018-06-20Bibliographically approved

Open Access in DiVA

fulltext(689 kB)10 downloads
File information
File name FULLTEXT01.pdfFile size 689 kBChecksum SHA-512
c768deff9aa662c1b0a0e2df01b6dad6d50db27b26f42d70db91c863a943bf67fedd41d7481c75dde345f5141a36f0a2817039a847597cac2bbf81e0f57319e1
Type fulltextMimetype application/pdf

By organisation
Department of Informatics and Media
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 10 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf