Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Pipelines: Integrating Open Source Tools and Microservices: An Integrated Approach for Scalable and Reliable Data Processing on Google Cloud Platform
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Datapipelines: Integrera öppen källkodsverktyg och mikrotjänster : En integrerad metod för skalbar och tillförlitlig databehandling på Google Cloud Platform (Swedish)
Abstract [en]

This thesis explores the design and implementation of modern data pipelines within the Google Cloud Platform, with a focus on identifying essential components, methodologies, and best practices to enhance their efficiency, reliability, and scalability. Conducted in collaboration with Astrafy, this research provides critical insights into the construction of automated data pipelines. The study specifically addresses the complexities of integrating various open source tools and microservices, aiming to simplify the development process by proposing a structured methodology for selecting and combining these technologies. A key contribution of this work is the creation of an automated one-click deployment system, which streamlines the setup and management of data pipelines for processing Google Cloud billing data. The system is evaluated through a series of experiments, comparing various technologies based on performance metrics such as cost, deployment time, and ease of use. The results offer valuable guidelines and replicable models for organizations seeking to implement scalable and reliable data pipelines on cloud platforms.

Abstract [sv]

Denna avhandling utforskar design och implementering av moderna datapipelines inom Google Cloud Platform, med fokus på att identifiera väsentliga komponenter, metoder och bästa praxis för att förbättra deras effektivitet, tillförlitlighet och skalbarhet. Genomförd i samarbete med Astrafy ger denna forskning viktiga insikter i konstruktionen av automatiserade datapipelines. Studien adresserar specifikt komplexiteten med att integrera olika öppen källkod-verktyg och mikrotjänster, med målet att förenkla utvecklingsprocessen genom att föreslå en strukturerad metod för att välja och kombinera dessa teknologier. Ett viktigt bidrag i detta arbete är skapandet av ett automatiserat system för enkel installation med ett klick, som effektiviserar uppsättningen och hanteringen av datapipelines för bearbetning av Google Clouds faktureringsdata. Systemet utvärderas genom en serie experiment där olika teknologier jämförs baserat på prestationsmått som kostnad, distributionstid och användarvänlighet. Resultaten ger värdefulla riktlinjer och replikabla modeller för organisationer som söker att implementera skalbara och tillförlitliga datapipelines på molnplattformar.

Place, publisher, year, edition, pages
2024. , p. 38
Series
TRITA-EECS-EX ; 2024:855
Keywords [en]
Data Pipelines, Google Cloud Platform, Microservices Integration, Open-Source Tools, Infrastructure as code
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-360662OAI: oai:DiVA.org:kth-360662DiVA, id: diva2:1941447
External cooperation
Astrafy
Supervisors
Examiners
Available from: 2025-04-08 Created: 2025-02-28 Last updated: 2025-04-08Bibliographically approved

Open Access in DiVA

fulltext(3801 kB)51 downloads
File information
File name FULLTEXT01.pdfFile size 3801 kBChecksum SHA-512
9b0da7c5bd0a73d06e2f487e35dff8a232a445eac6d294f1a15e48f24bf2904819d3a0d8b39959075671cb536d96c248b15798ed2e6e85ed59c9f075a198abd4
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 51 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 389 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf