Change search
ReferencesLink to record
Permanent link

Direct link
Large-scale ligand-based predictive modelling using support vector machines
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
Show others and affiliations
2016 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, 39Article in journal (Refereed) Published
Abstract [en]

The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

Place, publisher, year, edition, pages
2016. Vol. 8, 39
Keyword [en]
Predictive modelling; Support vector machine; Bioclipse; Molecular signatures; QSAR
National Category
Pharmaceutical Sciences Bioinformatics (Computational Biology)
Research subject
URN: urn:nbn:se:uu:diva-248959DOI: 10.1186/s13321-016-0151-5ISI: 000381186100001PubMedID: 27516811OAI: diva2:801460
Swedish National Infrastructure for Computing (SNIC), b2013262 b2015001Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceeSSENCE - An eScience Collaboration
Available from: 2015-04-09 Created: 2015-04-09 Last updated: 2016-09-15Bibliographically approved
In thesis
1. Ligand-based Methods for Data Management and Modelling
Open this publication in new window or tab >>Ligand-based Methods for Data Management and Modelling
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2015. 73 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 200
QSAR, ligand-based drug discovery, bioclipse, information system, cheminformatics, bioinformatics
National Category
Pharmaceutical Sciences Bioinformatics and Systems Biology
Research subject
Pharmaceutical Pharmacology; Bioinformatics
urn:nbn:se:uu:diva-248964 (URN)978-91-554-9237-3 (ISBN)
Public defence
2015-06-05, B22 BMC, Husargatan 3, Uppsala, 09:15 (English)
Available from: 2015-05-12 Created: 2015-04-09 Last updated: 2015-07-07

Open Access in DiVA

fulltext(1208 kB)26 downloads
File information
File name FULLTEXT01.pdfFile size 1208 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Alvarsson, JonathanLampa, SamuelSchaal, WesleyAndersson, ClaesWikberg, Jarl E. S.Spjuth, Ola
By organisation
Department of Pharmaceutical BiosciencesScience for Life Laboratory, SciLifeLabCancer Pharmacology and Computational Medicine
In the same journal
Journal of Cheminformatics
Pharmaceutical SciencesBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 26 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 507 hits
ReferencesLink to record
Permanent link

Direct link