Digitala Vetenskapliga Arkivet

Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A confidence predictor for logD using conformal regression and a support-vector machine
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. (Spjuth)ORCID-id: 0000-0002-0122-6680
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. (Spjuth)ORCID-id: 0000-0001-6709-7116
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab. (Spjuth)ORCID-id: 0000-0001-6740-9212
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. (Spjuth)
Vise andre og tillknytning
2018 (engelsk)Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, nr 1, artikkel-id 17Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.

sted, utgiver, år, opplag, sider
2018. Vol. 10, nr 1, artikkel-id 17
Emneord [en]
Conformal prediction, LogD, Machine learning, QSAR, RDF, Support-vector machine
HSV kategori
Forskningsprogram
Bioinformatik
Identifikatorer
URN: urn:nbn:se:uu:diva-347779DOI: 10.1186/s13321-018-0271-1ISI: 000429065900001PubMedID: 29616425OAI: oai:DiVA.org:uu-347779DiVA, id: diva2:1195839
Forskningsfinansiär
EU, Horizon 2020, 731075Tilgjengelig fra: 2018-04-06 Laget: 2018-04-06 Sist oppdatert: 2018-08-28bibliografisk kontrollert
Inngår i avhandling
1. Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
Åpne denne publikasjonen i ny fane eller vindu >>Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
2018 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high-throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. Based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2018. s. 68
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 256
Emneord
Reproducibility, Scientific Workflow Management Systems, Workflows, Pipelines, Flow-based programming, Predictive modelling, Semantic Web, Linked Data, Semantic MediaWiki, MediaWiki, RDF, SPARQL, Golang, Reproducerbarhet, Arbetsflödeshanteringssystem, Flödesbaserad programmering, Prediktiv modellering, Semantiska webben, Länkade data, Go
HSV kategori
Forskningsprogram
Bioinformatik; Farmakologi
Identifikatorer
urn:nbn:se:uu:diva-358353 (URN)978-91-513-0427-4 (ISBN)
Disputas
2018-09-28, Room B22, Biomedicinskt Centrum, Husargatan 3, Uppsala, 13:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
EU, Horizon 2020, 654241Swedish e‐Science Research CentereSSENCE - An eScience Collaboration
Tilgjengelig fra: 2018-09-04 Laget: 2018-08-28 Sist oppdatert: 2018-09-10

Open Access i DiVA

fulltext(1633 kB)60 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1633 kBChecksum SHA-512
e3070991e7daae33485ccee36ef111df08735167ce9e1a14500538eb07aa59605174591bd03b928bca452bb444a2a677f589f847fc210d0aeab39b5e8bfbf57c
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstPubMed

Søk i DiVA

Av forfatter/redaktør
Lapins, MarisArvidsson, StaffanLampa, SamuelBerg, ArvidSchaal, WesleyAlvarsson, JonathanSpjuth, Ola
Av organisasjonen
I samme tidsskrift
Journal of Cheminformatics

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 60 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 2657 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf