Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.ORCID iD: 0000-0001-6740-9212
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-8083-2864
2016 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, article id 67Article in journal (Refereed) Published
Abstract [en]

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.

Place, publisher, year, edition, pages
2016. Vol. 8, article id 67
Keywords [en]
Predictive modelling, Machine learning, Workflows, Drug discovery, Flow-based programming
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:uu:diva-315089DOI: 10.1186/s13321-016-0179-6ISI: 000391703900001OAI: oai:DiVA.org:uu-315089DiVA, id: diva2:1073204
Funder
eSSENCE - An eScience CollaborationSwedish e‐Science Research CenterSwedish National Infrastructure for Computing (SNIC), b2013262Available from: 2017-02-09 Created: 2017-02-09 Last updated: 2018-08-28Bibliographically approved
In thesis
1. Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
Open this publication in new window or tab >>Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high-throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. Based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 68
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 256
Keywords
Reproducibility, Scientific Workflow Management Systems, Workflows, Pipelines, Flow-based programming, Predictive modelling, Semantic Web, Linked Data, Semantic MediaWiki, MediaWiki, RDF, SPARQL, Golang, Reproducerbarhet, Arbetsflödeshanteringssystem, Flödesbaserad programmering, Prediktiv modellering, Semantiska webben, Länkade data, Go
National Category
Pharmacology and Toxicology Bioinformatics (Computational Biology)
Research subject
Bioinformatics; Pharmacology
Identifiers
urn:nbn:se:uu:diva-358353 (URN)978-91-513-0427-4 (ISBN)
Public defence
2018-09-28, Room B22, Biomedicinskt Centrum, Husargatan 3, Uppsala, 13:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, 654241Swedish e‐Science Research CentereSSENCE - An eScience Collaboration
Available from: 2018-09-04 Created: 2018-08-28 Last updated: 2018-09-10

Open Access in DiVA

fulltext(2035 kB)156 downloads
File information
File name FULLTEXT01.pdfFile size 2035 kBChecksum SHA-512
ef9feefb6e64108830f7440e2299a93c493438117f734ac40e48d2759a88d4214195a37e474c687754019116e7a9b5300a8a04fc3c6aff1032e9e39f0a8ee46c
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Lampa, SamuelAlvarsson, JonathanSpjuth, Ola
By organisation
Department of Pharmaceutical BiosciencesScience for Life Laboratory, SciLifeLab
In the same journal
Journal of Cheminformatics
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 156 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 550 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf