Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0002-4851-759x
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. , p. 71
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1846
Keywords [en]
cloud computing, bioinformatics, Big Data, microservices, containers, MapReduce
National Category
Computational Mathematics
Research subject
Scientific Computing
Identifiers
URN: urn:nbn:se:uu:diva-390666ISBN: 978-91-513-0730-5 (print)OAI: oai:DiVA.org:uu-390666DiVA, id: diva2:1344948
Public defence
2019-10-10, B42, Uppsala Biomedicinska Centrum, Husargatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2019-09-17 Created: 2019-08-22 Last updated: 2019-10-03
List of papers
1. Conformal prediction in Spark: Large-scale machine learning with confidence
Open this publication in new window or tab >>Conformal prediction in Spark: Large-scale machine learning with confidence
2015 (English)In: Proc. 2nd International Symposium on Big Data Computing, Los Alamitos, CA: IEEE Computer Society, 2015, p. 61-67Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2015
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-283636 (URN)10.1109/BDC.2015.35 (DOI)000380459200007 ()978-0-7695-5696-3 (ISBN)
Conference
BDC 2015, December 7–10, Limassol, Cyprus
Projects
eSSENCE
Available from: 2015-12-10 Created: 2016-04-13 Last updated: 2019-08-22Bibliographically approved
2. Large-scale virtual screening on public cloud resources with Apache Spark
Open this publication in new window or tab >>Large-scale virtual screening on public cloud resources with Apache Spark
Show others...
2017 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, article id 15Article in journal (Refereed) Published
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-318693 (URN)10.1186/s13321-017-0204-4 (DOI)000396830300001 ()28316653 (PubMedID)
Projects
eSSENCE
Available from: 2017-03-06 Created: 2017-03-27 Last updated: 2019-09-30Bibliographically approved
3. Interoperable and scalable data analysis with microservices: Applications in metabolomics
Open this publication in new window or tab >>Interoperable and scalable data analysis with microservices: Applications in metabolomics
Show others...
2019 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 19, p. 3752-3760Article in journal (Refereed) Published
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-390670 (URN)10.1093/bioinformatics/btz160 (DOI)
Available from: 2019-03-09 Created: 2019-08-13 Last updated: 2019-10-14Bibliographically approved
4. MaRe: a MapReduce-Oriented Framework for Processing Big Data with Application Containers
Open this publication in new window or tab >>MaRe: a MapReduce-Oriented Framework for Processing Big Data with Application Containers
2018 (English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-390664 (URN)
Available from: 2019-08-13 Created: 2019-08-13 Last updated: 2019-08-22
5. On-Demand Virtual Research Environments using Microservices
Open this publication in new window or tab >>On-Demand Virtual Research Environments using Microservices
Show others...
2018 (English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-390665 (URN)
Available from: 2019-08-13 Created: 2019-08-13 Last updated: 2019-08-22

Open Access in DiVA

fulltext(810 kB)96 downloads
File information
File name FULLTEXT01.pdfFile size 810 kBChecksum SHA-512
90d6059edbbec63b007b733117fe710039a575340291d7d331e769be02dfa6c702b10e3557eca9d772c17bc6677eb810a76840248fc535f1d84a6f955f9764fa
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Capuccini, Marco
By organisation
Division of Scientific ComputingComputational Science
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 96 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1021 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf