Digitala Vetenskapliga Arkivet

Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Conformal prediction in Spark: Large-scale machine learning with confidence
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.ORCID-id: 0000-0002-4851-759X
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab. (Spjuth)ORCID-id: 0000-0002-8083-2864
2015 (engelsk)Inngår i: Proc. 2nd International Symposium on Big Data Computing, Los Alamitos, CA: IEEE Computer Society, 2015, s. 61-67Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Los Alamitos, CA: IEEE Computer Society, 2015. s. 61-67
HSV kategori
Identifikatorer
URN: urn:nbn:se:uu:diva-283636DOI: 10.1109/BDC.2015.35ISI: 000380459200007ISBN: 978-0-7695-5696-3 (digital)OAI: oai:DiVA.org:uu-283636DiVA, id: diva2:919450
Konferanse
BDC 2015, December 7–10, Limassol, Cyprus
Prosjekter
eSSENCETilgjengelig fra: 2015-12-10 Laget: 2016-04-13 Sist oppdatert: 2020-01-24bibliografisk kontrollert
Inngår i avhandling
1. Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
Åpne denne publikasjonen i ny fane eller vindu >>Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
2019 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2019. s. 71
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1846
Emneord
cloud computing, bioinformatics, Big Data, microservices, containers, MapReduce
HSV kategori
Forskningsprogram
Beräkningsvetenskap
Identifikatorer
urn:nbn:se:uu:diva-390666 (URN)978-91-513-0730-5 (ISBN)
Disputas
2019-10-10, B42, Uppsala Biomedicinska Centrum, Husargatan 3, Uppsala, 13:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2019-09-17 Laget: 2019-08-22 Sist oppdatert: 2019-10-15

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Capuccini, MarcoSpjuth, Ola
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 573 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf