Digitala Vetenskapliga Arkivet

Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Large-scale ligand-based predictive modelling using support vector machines
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.ORCID-id: 0000-0001-6740-9212
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.ORCID-id: 0000-0001-6770-0878
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
Vise andre og tillknytning
2016 (engelsk)Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, artikkel-id 39Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

sted, utgiver, år, opplag, sider
2016. Vol. 8, artikkel-id 39
Emneord [en]
Predictive modelling; Support vector machine; Bioclipse; Molecular signatures; QSAR
HSV kategori
Forskningsprogram
Bioinformatik
Identifikatorer
URN: urn:nbn:se:uu:diva-248959DOI: 10.1186/s13321-016-0151-5ISI: 000381186100001PubMedID: 27516811OAI: oai:DiVA.org:uu-248959DiVA, id: diva2:801460
Forskningsfinansiär
Swedish National Infrastructure for Computing (SNIC), b2013262 b2015001Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceeSSENCE - An eScience CollaborationTilgjengelig fra: 2015-04-09 Laget: 2015-04-09 Sist oppdatert: 2018-08-28bibliografisk kontrollert
Inngår i avhandling
1. Ligand-based Methods for Data Management and Modelling
Åpne denne publikasjonen i ny fane eller vindu >>Ligand-based Methods for Data Management and Modelling
2015 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2015. s. 73
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 200
Emneord
QSAR, ligand-based drug discovery, bioclipse, information system, cheminformatics, bioinformatics
HSV kategori
Forskningsprogram
Farmaceutisk farmakologi; Bioinformatik
Identifikatorer
urn:nbn:se:uu:diva-248964 (URN)978-91-554-9237-3 (ISBN)
Disputas
2015-06-05, B22 BMC, Husargatan 3, Uppsala, 09:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2015-05-12 Laget: 2015-04-09 Sist oppdatert: 2018-01-11
2. Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
Åpne denne publikasjonen i ny fane eller vindu >>Reproducible Data Analysis in Drug Discovery with Scientific Workflows and the Semantic Web
2018 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high-throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. Based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2018. s. 68
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 256
Emneord
Reproducibility, Scientific Workflow Management Systems, Workflows, Pipelines, Flow-based programming, Predictive modelling, Semantic Web, Linked Data, Semantic MediaWiki, MediaWiki, RDF, SPARQL, Golang, Reproducerbarhet, Arbetsflödeshanteringssystem, Flödesbaserad programmering, Prediktiv modellering, Semantiska webben, Länkade data, Go
HSV kategori
Forskningsprogram
Bioinformatik; Farmakologi
Identifikatorer
urn:nbn:se:uu:diva-358353 (URN)978-91-513-0427-4 (ISBN)
Disputas
2018-09-28, Room B22, Biomedicinskt Centrum, Husargatan 3, Uppsala, 13:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
EU, Horizon 2020, 654241Swedish e‐Science Research CentereSSENCE - An eScience Collaboration
Tilgjengelig fra: 2018-09-04 Laget: 2018-08-28 Sist oppdatert: 2018-09-10

Open Access i DiVA

fulltext(1208 kB)332 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1208 kBChecksum SHA-512
a60826d9b55d13095693828bea1688df8cc8f21bf9798532a03fc9ba172382a58fa6b2483854dabcf3c5448d0dbbcab76ad14abdccb1f0e97de0b3dabbda1544
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstPubMed

Søk i DiVA

Av forfatter/redaktør
Alvarsson, JonathanLampa, SamuelSchaal, WesleyAndersson, ClaesWikberg, Jarl E. S.Spjuth, Ola
Av organisasjonen
I samme tidsskrift
Journal of Cheminformatics

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 332 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 1597 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf