Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Ligand-based Methods for Data Management and Modelling
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis , 2015. , p. 73
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 200
Keywords [en]
QSAR, ligand-based drug discovery, bioclipse, information system, cheminformatics, bioinformatics
National Category
Pharmaceutical Sciences Bioinformatics and Systems Biology
Research subject
Pharmaceutical Pharmacology; Bioinformatics
Identifiers
URN: urn:nbn:se:uu:diva-248964ISBN: 978-91-554-9237-3 (print)OAI: oai:DiVA.org:uu-248964DiVA, id: diva2:801538
Public defence
2015-06-05, B22 BMC, Husargatan 3, Uppsala, 09:15 (English)
Opponent
Supervisors
Available from: 2015-05-12 Created: 2015-04-09 Last updated: 2018-01-11
List of papers
1. Bioclipse 2: A scriptable integration platform for the life sciences
Open this publication in new window or tab >>Bioclipse 2: A scriptable integration platform for the life sciences
Show others...
2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 397-Article in journal (Refereed) Published
Abstract [en]

Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

Keywords
Bioclipse, bioinformatics, cheminformatics, scriptable, script, workbench, life science, platform
National Category
Bioinformatics and Systems Biology Pharmaceutical Sciences
Identifiers
urn:nbn:se:uu:diva-109304 (URN)10.1186/1471-2105-10-397 (DOI)000273329400001 ()
Available from: 2009-12-16 Created: 2009-10-13 Last updated: 2018-01-12Bibliographically approved
2. Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
Open this publication in new window or tab >>Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
Show others...
2011 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, no 1, article id 179Article in journal (Refereed) Published
Abstract [en]

Background:

Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

Results:

A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

Conclusions:

Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

Keywords
brunn, microtiter, bioclipse, screening, information system, lis, lims
National Category
Pharmacology and Toxicology
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-153210 (URN)10.1186/1471-2105-12-179 (DOI)000292027200001 ()21599898 (PubMedID)
Available from: 2011-05-09 Created: 2011-05-09 Last updated: 2018-01-12Bibliographically approved
3. Ligand-Based Target Prediction with Signature Fingerprints
Open this publication in new window or tab >>Ligand-Based Target Prediction with Signature Fingerprints
Show others...
2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, p. 2647-2653Article in journal (Refereed) Published
Abstract [en]

When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

National Category
Pharmaceutical Sciences Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-237934 (URN)10.1021/ci500361u (DOI)000343849600004 ()25230336 (PubMedID)
Funder
Swedish Research Council, VR-2011-6129eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)
Available from: 2014-12-08 Created: 2014-12-08 Last updated: 2018-01-11Bibliographically approved
4. Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
Open this publication in new window or tab >>Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
Show others...
2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, p. 3211-3217Article in journal (Refereed) Published
Abstract [en]

QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

National Category
Medical Biotechnology Pharmaceutical Sciences
Identifiers
urn:nbn:se:uu:diva-240239 (URN)10.1021/ci500344v (DOI)000345551000017 ()25318024 (PubMedID)
Funder
eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Available from: 2015-01-07 Created: 2015-01-06 Last updated: 2018-01-11Bibliographically approved
5. Large-scale ligand-based predictive modelling using support vector machines
Open this publication in new window or tab >>Large-scale ligand-based predictive modelling using support vector machines
Show others...
2016 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, article id 39Article in journal (Refereed) Published
Abstract [en]

The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

Keywords
Predictive modelling; Support vector machine; Bioclipse; Molecular signatures; QSAR
National Category
Pharmaceutical Sciences Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-248959 (URN)10.1186/s13321-016-0151-5 (DOI)000381186100001 ()27516811 (PubMedID)
Funder
Swedish National Infrastructure for Computing (SNIC), b2013262 b2015001Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceeSSENCE - An eScience Collaboration
Available from: 2015-04-09 Created: 2015-04-09 Last updated: 2018-08-28Bibliographically approved

Open Access in DiVA

fulltext(1275 kB)275 downloads
File information
File name FULLTEXT02.pdfFile size 1275 kBChecksum SHA-512
0948769c9e0d549467923157ecf56650a7d08a4a85232a6602994f7f335835202df9860a475c7ce5ad78370030f4e12b324ea90b64418f8fc6aba0e2b1b957cf
Type fulltextMimetype application/pdf
Buy this publication >>

Search in DiVA

By author/editor
Alvarsson, Jonathan
By organisation
Department of Pharmaceutical Biosciences
Pharmaceutical SciencesBioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 319 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1698 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf