Digitala Vetenskapliga Arkivet

Endre søk
Begrens søket
123 51 - 100 of 101
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51.
    Lapins, Maris
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Worachartcheewan, Apilak
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Georgiev, Valentin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Prachayasittikul, Virapong
    Nantasenamat, Chanin
    Wikberg, Jarl E. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    A Unified Proteochemometric Model for Prediction of Inhibition of Cytochrome P450 Isoforms2013Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, nr 6, s. e66566-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A unified proteochemometric (PCM) model for the prediction of the ability of drug-like chemicals to inhibit five major drug metabolizing CYP isoforms (i.e. CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4) was created and made publicly available under the Bioclipse Decision Support open source system at www.cyp450model.org. In regards to the proteochemometric modeling we represented the chemical compounds by molecular signature descriptors and the CYP-isoforms by alignment-independent description of composition and transition of amino acid properties of their protein primary sequences. The entire training dataset contained 63 391 interactions and the best PCM model was obtained using signature descriptors of height 1, 2 and 3 and inducing the model with a support vector machine. The model showed excellent predictive ability with internal AUC = 0.923 and an external AUC = 0.940, as evaluated on a large external dataset. The advantage of PCM models is their extensibility making it possible to extend our model for new CYP isoforms and polymorphic CYP forms. A key benefit of PCM is that all proteins are confined in one single model, which makes it generally more stable and predictive as compared with single target models. The inclusion of the model in Bioclipse Decision Support makes it possible to make virtual instantaneous predictions (∼100 ms per prediction) while interactively drawing or modifying chemical structures in the Bioclipse chemical structure editor.

    Fulltekst (pdf)
    fulltext
  • 52.
    Novella, Jon Ander
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Emami Khoonsari, Payam
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk kemi.
    Herman, Stephanie
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk kemi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Whitenack, Daniel
    Capuccini, Marco
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    Burman, Joachim
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för neurovetenskap, Neurologi.
    Kultima, Kim
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk kemi.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Container-based bioinformatics with Pachyderm2019Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, s. 839-846Artikkel i tidsskrift (Fagfellevurdert)
  • 53.
    O'Boyle, Noel
    et al.
    University College Cork.
    Guha, Rajarshi
    NIH Center for Translational Therapeutic.
    Willighagen, Egon
    Karolinska Institutet.
    Adams, Samuel
    University of Cambridge.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Bradley, Jean-Claude
    Drexel University.
    Filippov, Igor
    NCI-Frederick.
    Hansson, Robert
    St. Olaf College.
    Hanwell, Marcus
    Kitware, Inc.
    Hutchison, Geoffrey
    University of Pittsburg.
    James, Craig
    eMolecules Inc.
    Jeliazkova, Nina
    Ideaconsult Ltd.
    Lang, Andrew
    Oral Roberts University.
    Langner, Karol
    Leiden University.
    Lonie, David
    State University of New York at Buffalo.
    Lowe, Daniel
    University of Cambridge.
    Pansanel, Jerome
    Université de Strasbourg.
    Pavlov, Dmitry
    GGA Software Service.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Steinbeck, Christoph
    European Bioinformatics Institute.
    Tenderholt, Adam
    University of Washington.
    Thiesen, Kevin
    Chemlabs.
    Murray-Rust, Peter
    University of Cambridge.
    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on2011Inngår i: Journal of Cheminformatics, ISSN 1758-2946, Vol. 3, s. 37-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data,Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistryresearch by promoting interoperability between chemistry software, encouraging cooperation between OpenSource developers, and developing community resources and Open Standards.

    Results: This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveysprogress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry.

    Conclusions: We show that the Blue Obelisk has been very successful in bringing together researchers anddevelopers with common interests in ODOSOS, leading to development of many useful resources freely availableto the chemistry community

  • 54.
    Oki, Noffisat
    et al.
    Douglas Connect GmbH, Basel, Switzerland.
    Exner, Thomas
    Douglas Connect GmbH, Basel, Switzerland.
    Kramer, Stefan
    Johannes Gutenberg Univ Mainz, Mainz, Germany.
    Notredame, Cedric
    Fundacio Ctr Regulacio Genom, Barcelona, Spain.
    Jennen, Danyel
    Univ Maastricht, Maastricht, Netherlands.
    Gkoutos, Georgios
    Univ Birmingham, Birmingham, W Midlands, England.
    Sarimveis, Haralambos
    Natl Tech Univ Athens, Athens, Greece.
    Jacobs, Marc
    Fraunhofer Gesell Foerderung Angewandten Forsch E, Munich, Germany.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Dudgeon, Tim
    Informat Matters Ltd, Kidlington, England.
    Bois, Frederic
    Inst Natl Environm & Risques, Verneuil En Halatte, France.
    Jennings, Paul
    Vrije Univ Amsterdam, Amsterdam, Netherlands.
    Hardy, Barry
    Douglas Connect GmbH, Basel, Switzerland.
    OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration, in silico analysis and modelling in risk assessment2018Inngår i: Abstract of Papers of the American Chemical Society, ISSN 0065-7727, Vol. 255Artikkel i tidsskrift (Annet vitenskapelig)
  • 55. Peters, Kristian
    et al.
    Bradbury, James
    Bergmann, Sven
    Capuccini, Marco
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Cascante, Marta
    de Atauri, Pedro
    Ebbels, Timothy M. D.
    Foguet, Carles
    Glen, Robert
    Gonzalez-Beltran, Alejandra
    Günther, Ulrich L.
    Handakas, Evangelos
    Hankemeier, Thomas
    Haug, Kenneth
    Herman, Stephanie
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk kemi.
    Holub, Petr
    Izzo, Massimiliano
    Jacob, Daniel
    Johnson, David
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Samhällsvetenskapliga fakulteten, Institutionen för informatik och media.
    Jourdan, Fabien
    Kale, Namrata
    Karaman, Ibrahim
    Khalili, Bita
    Emami Khoonsari, Payam
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk kemi.
    Kultima, Kim
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk kemi.
    Lampa, Samuel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Larsson, Anders
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Ludwig, Christian
    Moreno, Pablo
    Neumann, Steffen
    Novella, Jon Ander
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    O'Donovan, Claire
    Pearce, Jake T. M.
    Peluso, Alina
    Piras, Marco Enrico
    Pireddu, Luca
    Reed, Michelle A. C.
    Rocca-Serra, Philippe
    Roger, Pierrick
    Rosato, Antonio
    Rueedi, Rico
    Ruttkies, Christoph
    Sadawi, Noureddin
    Salek, Reza M.
    Sansone, Susanna-Assunta
    Selivanov, Vitaly
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Schober, Daniel
    Thévenot, Etienne A.
    Tomasoni, Mattia
    van Rijswijk, Merlijn
    van Vliet, Michael
    Viant, Mark R.
    Weber, Ralf J. M.
    Zanetti, Gianluigi
    Steinbeck, Christoph
    PhenoMeNal: Processing and analysis of metabolomics data in the cloud2019Inngår i: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 8, nr 2Artikkel i tidsskrift (Fagfellevurdert)
  • 56.
    Rostkowski, Michal
    et al.
    University of Copenhagen.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Rydberg, Patrik
    University of Copenhagen.
    WhichCyp: Prediction of Cytochromes P450 Inhibition2013Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, nr 16, s. 2051-2052Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    SUMMARY: In this work we present WhichCyp, a tool for prediction of which cytochromes P450 isoforms (among 1A2, 2C9, 2C19, 2D6 and 3A4) a given molecule is likely to inhibit. The models are built from experimental high-throughput data using support vector machines and molecular signatures.

    AVAILABILITY: The WhichCyp server is freely available for use on the web at http://drug.ku.dk/whichcyp, where the WhichCyp Java program and source code is also available for download.

    CONTACT: pry@sund.ku.dk

    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • 57.
    Schaal, Wesley
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Hammerling, Ulf
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Gustafsson, Mats G
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Automated QuantMap for rapid quantitative molecular network topology analysis2013Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, nr 18, s. 2369-2370Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    SUMMARY:

    The previously disclosed QuantMap method for grouping chemicals by biological activity used online services for much of the data gathering and some of the numerical analysis. The present work attempts to streamline this process by using local copies of the databases and in-house analysis. Using computational methods similar or identical to those used in the previous work, a qualitatively equivalent result was found in just a few seconds on the same dataset (collection of 18 drugs). We use the user-friendly Galaxy framework to enable users to analyze their own datasets. Hopefully, this will make the QuantMap method more practical and accessible and help achieve its goals to provide substantial assistance to drug repositioning, pharmacology evaluation and toxicology risk assessment.

    AVAILABILITY:

    http://galaxy.predpharmtox.org

    CONTACT:

    mats.gustafsson@medsci.uu.se or ola.spjuth@farmbio.uu.se

    SUPPLEMENTARY INFORMATION:

    Supplementary data are available at Bioinformatics online.

    Fulltekst (pdf)
    fulltext
  • 58.
    Schaduangrat, Nalini
    et al.
    Mahidol Univ, Ctr Data Min & Biomed Informat, Fac Med Technol, Bangkok 10700, Thailand.
    Lampa, Samuel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Simeon, Saw
    Kasetsart Univ, Interdisciplinary Grad Program Biosci, Fac Sci, Bangkok 10900, Thailand.
    Gleeson, Matthew Paul
    King Mongkuts Inst Technol Ladkrabang, Dept Biomed Engn, Fac Engn, Bangkok 10520, Thailand.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Nantasenamat, Chanin
    Mahidol Univ, Ctr Data Min & Biomed Informat, Fac Med Technol, Bangkok 10700, Thailand.
    Towards reproducible computational drug discovery2020Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 12, nr 1, artikkel-id 9Artikkel, forskningsoversikt (Fagfellevurdert)
    Abstract [en]

    The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.

    Fulltekst (pdf)
    FULLTEXT01
  • 59. Shoombuatong, Watshara
    et al.
    Prathipati, Philip
    Prachayasittikul, Veda
    Schaduangrat, Nalini
    Malik, Aijaz Ahmad
    Pratiwi, Reny
    Wanwimolruk, Sompon
    Wikberg, Jarl E. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Gleeson, Matthew Paul
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Nantasenamat, Chanin
    Towards Predicting the Cytochrome P450 Modulation: From QSAR to proteochemometric modeling.2017Inngår i: Current drug metabolism, ISSN 1389-2002, E-ISSN 1875-5453, Vol. 18, nr 6, s. 540-555Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Drug metabolism determines the fate of a drug when it enters the human body and is a critical factor in defining their absorption, distribution, metabolism, excretion and toxicity (ADMET) characteristics. Among the various drug metabolizing enzymes, cytochrome P450s (CYP450) constitute an important protein family that aside from functioning in xenobiotic metabolism is also responsible for a diverse array of other roles encompassing steroid and cholesterol biosynthesis, fatty acid metabolism, calcium homeostasis, neuroendocrine functions and growth regulation. Although CYP450 typically convert xenobiotics into safe metabolites, there are some situations whereby the metabolite is more toxic than its parent molecule. Computational modeling has been instrumental in CYP450 research by rationalizing the nature of the binding event (i.e. inhibit or induce CYP450s) or metabolic stability of query compounds of interest. A plethora of computational approaches encompassing ligand, structure and systems based approaches have been utilized to model CYP450-ligand interactions. This review provides a brief background on the CYP450 family (i.e. its roles, advantages and disadvantages as well as its modulators) and then discusses the various computational approaches that have been used to model CYP450-ligand interaction. Particular focus is given to the use of quantitative structure-activity relationship (QSAR) and more recent proteochemometric modeling studies. Finally, a perspective on the current state of the art and future trends of the field is provided.

  • 60. Simeon, Saw
    et al.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Lapins, Maris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Nabu, Sunanta
    Anuwongcharoen, Nuttapat
    Prachayasittikul, Virapong
    Wikberg, Jarl E. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Nantasenamat, Chanin
    Origin of aromatase inhibitory activity via proteochemometric modeling2016Inngår i: PeerJ, ISSN 2167-8359, E-ISSN 2167-8359, Vol. 4, artikkel-id e1979Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Aromatase, the rate-limiting enzyme that catalyzes the conversion of androgen to estrogen, plays an essential role in the development of estrogen-dependent breast cancer. Side effects due to aromatase inhibitors (AIs) necessitate the pursuit of novel inhibitor candidates with high selectivity, lower toxicity and increased potency. Designing a novel therapeutic agent against aromatase could be achieved computationally by means of ligand-based and structure-based methods. For over a decade, we have utilized both approaches to design potential AIs for which quantitative structure-activity relationships and molecular docking were used to explore inhibitory mechanisms of AIs towards aromatase. However, such approaches do not consider the effects that aromatase variants have on different AIs. In this study, proteochemometrics modeling was applied to analyze the interaction space between AIs and aromatase variants as a function of their substructural and amino acid features. Good predictive performance was achieved, as rigorously verified by 10-fold cross-validation, external validation, leave-one-compound-out cross-validation, leave-one-protein-out cross-validation and Y-scrambling tests. The investigations presented herein provide important insights into the mechanisms of aromatase inhibitory activity that could aid in the design of novel potent AIs as breast cancer therapeutic agents.

    Fulltekst (pdf)
    fulltext
  • 61.
    Siretskiy, Alexey
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis using Hadoop2014Inngår i: Proc. 10th International Conference on e-Science, IEEE Computer Society, 2014, s. 317-323Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Hadoop is a convenient framework in e-Science enabling scalable distributed data analysis. In molecular biology, next-generation sequencing produces vast amounts of data and requires flexible frameworks for constructing analysis pipelines. We extend the popular HTSeq package into the Hadoop realm by introducing massively parallel versions of short read quality assessment as well as functionality to count genes mapped by the short reads. We use the Hadoop-streaming library which allows the components to run in both Hadoop and regular Linux systems and evaluate their performance in two different execution environments: A single node on a computational cluster and a Hadoop cluster in a private cloud. We compare the implementations with Apache Pig showing improved runtime performance of our developed methods. We also inject the components in the graphical platform Cloudgene to simplify user interaction.

  • 62.
    Siretskiy, Alexey
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Sundqvist, Tore
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Voznesenskiy, Mikhail
    St Petersburg State Univ, Inst Chem, Dept Phys Chem, St Petersburg 199034, Russia.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data2015Inngår i: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 4, artikkel-id 26Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. Results: In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. Conclusions: From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.

    Fulltekst (pdf)
    fulltext
  • 63.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Bioclipse: Integration of Data and Software in the Life Sciences2009Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    New high throughput experimental techniques have turned the life sciences into a data-intensive field. Scientists are faced with new types of problems, such as managing voluminous sources of information, integrating heterogeneous data, and applying the proper analysis algorithms; all to end up with reliable conclusions. These challenges call for an infrastructure of algorithms and technologies to supply researchers with the tools and methods necessary to maximize the usefulness of the data. eScience has emerged as a promising technology to take on these challenges, and denotes integrated science carried out in highly distributed network environments, or science that makes use of large data sets and requires high performance computing resources.

    In this thesis I present standards, exchange formats, algorithms, and software implementations for empowering researchers in the life sciences with the tools of eScience. The work is centered around Bioclipse - an extensible workbench developed in the frame of this thesis - which provides users with instruments for carrying out integrated research and where technical details are hidden under simple graphical interfaces. Bioclipse is a Rich Client that takes full advantage of the many offerings of eScience, such as networked databases and online services. The benefits of mixing local and remote software in a unifying platform are demonstrated with an integrated approach for predicting metabolic sites in chemical structures. To overcome the limitations of the commonly used technologies for interacting with networked services, I also present a new technology using the XMPP protocol. This enables service discovery and asynchronous communication between the client and server, which is ideal for long-running analyses.

    To maximize the usefulness of the available data there is a need for standards, ontologies, and exchange formats, in order to define what information should be captured and how it should be structured and exchanged. A novel format for exchanging QSAR data sets in a fully interoperable and reproducible form is presented, together with an implementation in Bioclipse that takes advantage of eScience components during the setup process.

    Bioclipse has been well received by the scientific community, attracted a large group of international users and developers, and has been awarded three international prizes for its innovative character. With continued development, the project has a good chance of becoming an important component in a sustainable infrastructure for the life sciences.

    Fulltekst (pdf)
    FULLTEXT02
  • 64.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eine visuelle Open-Source-Platform für Chemo- und Bioinformatik2006Inngår i: JAVAmagazin, ISSN 1619-795X, nr 8Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 65.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    NGS data management and analysis for hundreds of projects: Experiences from Sweden2013Inngår i: NGS Data after the Gold Rush, 2013Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    UPPNEX is a national e-infrastructure for next-generation sequencing data storage and analysis in Sweden. This presentation features strategic decisions made regarding hardware, software, maintenance and support, resource allocation, and illustrate challenges such as managing data growth in a shared system with over 400 research projects of varying types. Insights into bioinformatics usage patterns are also presented, together with the ongoing development to extend the e-infrastructure with redundant resources, a secure system for analyzing sensitive data, and a private cloud. 

  • 66.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Novel applications of Machine Learning in cheminformatics2018Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, artikkel-id 46Artikkel i tidsskrift (Annet vitenskapelig)
    Fulltekst (pdf)
    fulltext
  • 67.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Pharmaceutical Bioinformatics PrimerManuskript (preprint) (Annet vitenskapelig)
    Fulltekst (pdf)
    fulltext
  • 68.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Using Bioclipse to integrate bioinformatics functionality2005Inngår i: EMBnet.news, ISSN 1023-4144, Vol. 13, nr 1, s. 5-11Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 69.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alvarsson, Johan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Krachtus, Dieter
    University of Heidelberg, Germany.
    Bioclipse 2.0: Life Science setzt auf die Staerken von Eclipse.2009Inngår i: Eclipse Magazine, ISSN 1861-2296, nr 4Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 70.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Bioclipse 2: Towards integrated biocheminformatics2009Inngår i: EMBnet.news, ISSN 1023-4144, Vol. 15, nr 3, s. 25-27Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 71.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Berg, Arvid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Kuhn, Stefan
    European Bioinformatics Institute, Hinxton, UK.
    Mäsak, Carl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Torrance, Gilleain
    European Bioinformatics Institute, Hinxton, UK.
    Wagener, Johannes
    Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Steinbeck, Christoph
    European Bioinformatics Institute, Hinxton, UK.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Bioclipse 2: A scriptable integration platform for the life sciences2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. 397-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

    Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

    Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

    Fulltekst (pdf)
    FULLTEXT01
  • 72.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Berg, Arvid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Willighagen, Egon
    3 Department of Bioinformatics - BiGCaT, Maastricht University.
    Applications of the InChI in cheminformatics with the CDK and Bioclipse2013Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 5, nr 14Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    The InChI algorithms are written in C++ and not available as Java library. Integration into softwarewritten in Java therefore requires a bridge between C and Java libraries, provided by the Java NativeInterface (JNI) technology.

    Results

    We here describe how the InChI library is used in the Bioclipse workbench and the Chemistry Development Kit (CDK) cheminformatics library. To make this possible, a JNI bridge to the InChIlibrary was developed, JNI-InChI, allowing Java software to access the InChI algorithms. By usingthis bridge, the CDK project packages the InChI binaries in a module and offers easy access fromJava using the CDK API. The Bioclipse project packages and offers InChI as a dynamic OSGi bundlethat can easily be used by any OSGi-compliant software, in addition to the regular Java Archive andMaven bundles. Bioclipse itself uses the InChI as a key component and calculates it on the fly whenvisualizing and editing chemical structures. We demonstrate the utility of InChI with various applications in CDK and Bioclipse, such as decision support for chemical liability assessment, tautomergeneration, and for knowledge aggregation using a linked data approach.

    Conclusions

    These results show that the InChI library can be used in a variety of Java library dependency solutions, making the functionality easily accessible by Java software, such as in the CDK. The applications show various ways the InChI has been used in Bioclipse, to enrich its functionality.

    Fulltekst (pdf)
    fulltext
  • 73.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Bongcam-Rudloff, Erik
    Carrasco Hernández, Guillermo
    Forer, Lukas
    Giovacchini, Mario
    Guimera, Roman Valls
    Kallio, Aleksi
    Korpelainen, Eija
    Kańduła, Maciej M.
    Krachunov, Milko
    Kreil, David P.
    Kulev, Ognyan
    Łabaj, Paweł P.
    Lampa, Samuel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Pireddu, Luca
    Schönherr, Sebastian
    Siretskiy, Alexey
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Vassilev, Dimitar
    Experiences with workflows for automating data-intensive bioinformatics2015Inngår i: Biology Direct, ISSN 1745-6150, E-ISSN 1745-6150, Vol. 10, artikkel-id 43Artikkel, forskningsoversikt (Fagfellevurdert)
  • 74.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Bongcam-Rudloff, Erik
    Dahlberg, Johan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper.
    Dahlö, Martin
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi. Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Kallio, Aleksi
    Pireddu, Luca
    Vezzi, Francesco
    Korpelainen, Eija
    Recommendations on e-infrastructures for next-generation sequencing2016Inngår i: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 5, artikkel-id 26Artikkel i tidsskrift (Fagfellevurdert)
  • 75.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Brännström, Robin Carrión
    Statisticon AB.
    Carlsson, Lars
    Stena Line AB.
    Gauraha, Niharika
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets2019Inngår i: Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, PMLR , 2019, Vol. 105, s. 53-65Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Conformal Prediction is a framework that produces prediction intervals based on the output from a machine learning algorithm. In this paper we explore the case when training data is made up of multiple parts available in different sources that cannot be pooled. We here consider the regression case and propose a method where a conformal predictor is trained on each data source independently, and where the prediction intervals are then combined into a single interval. We call the approach Non-Disclosed Conformal Prediction (NDCP), and we evaluate it on a regression dataset from the UCI machine learning repository using support vector regression as the underlying machine learning algorithm, with varying number of data sources and sizes. The results show that the proposed method produces conservatively valid prediction intervals, and while we cannot retain the same efficiency as when all data is used, efficiency is improved through the proposed approach as compared to predicting using a single arbitrarily chosen source.

  • 76.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Carlsson, Lars
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Georgiev, Valentin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Open source drug discovery with Bioclipse2012Inngår i: Current Topics in Medicinal Chemistry, ISSN 1568-0266, E-ISSN 1873-4294, Vol. 12, nr 18, s. 1980-1986Artikkel, forskningsoversikt (Fagfellevurdert)
    Abstract [en]

    We present the open source components for drug discovery that has been developed and integrated into the graphical workbench Bioclipse. Building on a solid open source cheminformatics core, Bioclipse has advanced functionality for managing and visualizing chemical structures and related information. The features presented here include QSAR/QSPR modeling, various predictive solutions such as decision support for chemical liability assessment, site-of-metabolism prediction, virtual screening, and knowledge discovery and integration. We demonstrate the utility of the described tools with examples from computational pharmacology, toxicology, and ADME. Bioclipse is used in both academia and industry, and is a good example of open source leading to new solutions for drug discovery.

    Fulltekst (pdf)
    fulltext
  • 77.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Helgee, Ernst Ahlberg
    Boyer, Scott
    Carlsson, Lars
    Integrated Decision Support for Assessing Chemical Liabilities2011Inngår i: Journal of chemical information and modeling, ISSN 1549-9596, Vol. 51, nr 8, s. 1840-1847Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Chemical liabilities, such as adverse effects and toxicity, have a major impact on today's drug discovery process. In silk prediction of chemical liabilities is an important approach which can reduce costs and animal testing by complementing or replacing in vitro and in vivo liability models. There is a lack of integrated, extensible decision support systems for chemical liability assessment which run quickly and have easily interpretable results. Here we present a method which integrates similarity searches, structural alerts, and QSAR models which all are available from the Bioclipse workbench. Emphasis has been placed on interpretation of results, and substructures which are important for predictions are highlighted in the original chemical structures. This allows for interactively changing chemical structures with instant visual feedback and can be used for hypothesis testing of single chemical structures as well as compound collections. The system has a clear separation between methods and data, and the extensible architecture enables straightforward extension via addition of more plugins (such as new data sets and computational models). We demonstrate our method on three important safety end points: mutagenicity, carcinogenicity, and aryl hydrocarbon receptor (AhR) activation. Bioclipse and the decision support implementation are free, open source, and available from http://www.bioclipse.net/decision-support.

  • 78.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lapins, Maris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Junaid, Muhammad
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Services for prediction of drug susceptibility for HIV proteases and reverse transcriptases at the HIV Drug Research Centre2011Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, nr 12, s. 1719-1720Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Summary: The HIV Drug Research Centre (HIVDRC) has established Web services for prediction of drug susceptibility for HIV proteases and reverse transcriptases. The services are based on two proteochemometric models which accepts a protease or reverse transcriptase sequence in amino acid form, and outputs the predicted drug susceptibility values. The predictions are based on a comprehensive analysis where all the relevant inhibitors are included, resulting in models with excellent predictive capabilities.

    Availability and Implementation: The services are implemented as interoperable Web services (REST and XMPP), with supporting web pages to allow for individual analyses. A set of plugins were also developed which make the services available from the Bioclipse workbench for life science. Services are available athttp://www.hivdrc.org/services.

  • 79.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    An Information System for Proteochemometrics2005Inngår i: CDK News, ISSN 1614-7553, Vol. 2, nr 2, s. 54-56Artikkel i tidsskrift (Fagfellevurdert)
    Fulltekst (pdf)
    FULLTEXT01
  • 80.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Georgiev, Valentin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Carlsson, Lars
    Global Safety Assesment, AstraZeneca R&D.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Berg, Arvid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl E S
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Bioclipse-R: Integrating management and visualization of life science data with statistical analysis2013Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, nr 2, s. 286-289Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Bioclipse, a graphical workbench for the life sciences, provides functionality for managing and visualizing life science data. We introduce Bioclipse-R, which integrates Bioclipse and the statistical programming language R. The synergy between Bioclipse and R is demonstrated by the construction of a decision support system for anticancer drug screening and mutagenicity prediction, which shows how Bioclipse-R can be used to perform complex tasks from within a single software system.

  • 81.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab. Karolinska Institutet.
    Heikkinen, Jani
    Litton, Jan-Eric
    Karolinska Institutet.
    Palmgren, Juni
    Karolinska Institutet.
    Krestyaninova, Maria
    Uniquer Sarl.
    Data Integration between Swedish National Clinical Health Registries and Biobanks Using an Availability System2014Inngår i: Data Integration in the Life Sciences / [ed] Galhardas, Helena; Rahm, Erhard, Springer International Publishing , 2014, Vol. 8574, s. 32-40Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    Linking biobank data, such as molecular profiles, with clinical phenotypes is of great importance in epidemiological and predictive studies. A comprehensive overview of various data sources that can be combined in order to power up a study is a key factor in the design. Clinical data stored in health registries and biobank data in research projects are commonly provisioned in different database systems and governed by separate organizations, making the integration process challenging and hampering biomedical investigations. We here describe the integration of data on prostate cancer from a clinical health registry with data from a biobank, and its provisioning in the SAIL availability system. We demonstrate the implications of using the actual raw data, data transformed to availability data, and availability data which has been subjected to anonymization techniques to reduce the risk of re-identification. Our results show that an availability system such as SAIL with integrated clinical and biobank data can be a valuable tool for planning new studies and finding interesting subsets to investigate further. We also show that an availability system can deliver useful insights even when the data has been subjected to anonymization techniques.

  • 82.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Helmus, Tobias
    Willighagen, Egon L
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Kuhn, Stefan
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Wagener, Johannes
    Murray-Rust, Peter
    Steinbeck, Christoph
    Wikberg, Jarl E S
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Bioclipse: an open source workbench for chemo- and bioinformatics2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 59-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

    Fulltekst (pdf)
    fulltext
  • 83.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab. Department of Medical Epidemiology and Biostatistics and Swedish e-Science Research Centre, Karolinska Institutet, Stockholm, Sweden..
    Karlsson, Andreas
    Clements, Mark
    Humphreys, Keith
    Ivansson, Emma
    Dowling, Jim
    Eklund, Martin
    Jauhiainen, Alexandra
    Czene, Kamila
    Grönberg, Henrik
    Sparén, Pär
    Wiklund, Fredrik
    Cheddad, Abbas
    Pálsdóttir, Þorgerður
    Rantalainen, Mattias
    Abrahamsson, Linda
    Laure, Erwin
    Litton, Jan-Eric
    Palmgren, Juni
    E-Science technologies in a workflow for personalized medicine using cancer screening as a case study2017Inngår i: JAMIA Journal of the American Medical Informatics Association, ISSN 1067-5027, E-ISSN 1527-974X, Vol. 24, nr 5, s. 950-957Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Objective: We provide an e-Science perspective on the workflow from risk factor discovery and classification of disease to evaluation of personalized intervention programs. As case studies, we use personalized prostate and breast cancer screenings.

    Materials and Methods: We describe an e-Science initiative in Sweden, e-Science for Cancer Prevention and Control (eCPC), which supports biomarker discovery and offers decision support for personalized intervention strategies. The generic eCPC contribution is a workflow with 4 nodes applied iteratively, and the concept of e-Science signifies systematic use of tools from the mathematical, statistical, data, and computer sciences.

    Results: The eCPC workflow is illustrated through 2 case studies. For prostate cancer, an in-house personalized screening tool, the Stockholm-3 model (S3M), is presented as an alternative to prostate-specific antigen testing alone. S3M is evaluated in a trial setting and plans for rollout in the population are discussed. For breast cancer, new biomarkers based on breast density and molecular profiles are developed and the US multicenter Women Informed to Screen Depending on Measures (WISDOM) trial is referred to for evaluation. While current eCPC data management uses a traditional data warehouse model, we discuss eCPC-developed features of a coherent data integration platform.

    Discussion and Conclusion: E-Science tools are a key part of an evidence-based process for personalized medicine. This paper provides a structured workflow from data and models to evaluation of new personalized intervention strategies. The importance of multidisciplinary collaboration is emphasized. Importantly, the generic concepts of the suggested eCPC workflow are transferrable to other disease domains, although each disease will require tailored solutions.

    Fulltekst (pdf)
    fulltext
  • 84.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Krestyaninova, Maria
    Hastings, Janna
    Shen, Huei-Yi
    Heikkinen, Jani
    Waldenberger, Melanie
    Langhammer, Arnulf
    Ladenvall, Claes
    Esko, Tõnu
    Persson, Mats-Åke
    Heggland, Jon
    Dietrich, Joern
    Ose, Sandra
    Gieger, Christian
    Ried, Janina S
    Peters, Annette
    Fortier, Isabel
    de Geus, Eco Jc
    Klovins, Janis
    Zaharenko, Linda
    Willemsen, Gonneke
    Hottenga, Jouke-Jan
    Litton, Jan-Eric
    Karvanen, Juha
    Boomsma, Dorret I
    Groop, Leif
    Rung, Johan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi.
    Palmgren, Juni
    Pedersen, Nancy L
    McCarthy, Mark I
    van Duijn, Cornelia M
    Hveem, Kristian
    Metspalu, Andres
    Ripatti, Samuli
    Prokopenko, Inga
    Harris, Jennifer R
    Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research2016Inngår i: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 24, nr 4, s. 521-528Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.

    Fulltekst (pdf)
    fulltext
  • 85.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Rydberg, Patrik
    Department of Drug Design and Pharmacology, University of Copenhagen.
    Willighagen, Egon L.
    Department of Bioinformatics - BiGCaT, Maastricht University.
    Evelo, Chris T.
    Department of Bioinformatics - BiGCaT, Maastricht University.
    Jeliazkova, Nina
    IdeaConsult Ltd, 4 A Kanchev Str, Sofia 1000, Bulgaria.
    XMetDB: an open access database for xenobiotic metabolism2016Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, artikkel-id 47Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Xenobiotic metabolism is an active research topic but the limited amount of openly available high-quality biotransformation data constrains predictive modeling. Current database often default to commonly available information: which enzyme metabolizes a compound, but neither experimental conditions nor the atoms that undergo metabolization are captured. We present XMetDB, an open access database for drugs and other xenobiotics and their respective metabolites. The database contains chemical structures of xenobiotic biotransformations with substrate atoms annotated as reaction centra, the resulting product formed, and the catalyzing enzyme, type of experiment, and literature references. Associated with the database is a web interface for the submission and retrieval of experimental metabolite data for drugs and other xenobiotics in various formats, and a web API for programmatic access is also available. The database is open for data deposition, and a curation scheme is in place for quality control. An extensive guide on how to enter experimental data into is available from the XMetDB wiki. XMetDB formalizes how biotransformation data should be reported, and the openly available systematically labeled data is a big step forward towards better models for predictive metabolism.

    Fulltekst (pdf)
    fulltext
  • 86.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Guha, Rajarshi
    NIH Chemical Genomics Center.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Towards interoperable and reproducible QSAR analyses: Exchange of data sets2010Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 2, artikkel-id 5Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: QSAR/QSPR is a widely used method to relate chemical structures and responses based on ex- perimental observations. In QSAR, chemical structures are expressed as descriptors, which are mathematical representations like calculated properties or enumerated fragments. Many existing QSAR data sets are based on a combination of different software tools mixed with in-house developed solutions, with datasets manually assembled in spreadsheets. Currently there exists no agreed-upon definition of descriptors and no standard for exchanging data sets in QSAR, which together with numerous different descriptor implementations makes it a virtually impossible task to reproduce and validate analyses, and significantly hinders collaborations and re-use of data.

    RESULTS: We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR/QSPR data sets, comprising an open XML format (QSAR-ML) and an open extensible descriptor ontology (Blue Obelisk Descriptor Ontology). The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a data set described by QSAR-ML makes its setup completely reproducible. We also provide an implementation as a set of plugins for Bioclipse that simplifies QSAR data set formation, and allows for exporting in QSAR-ML as well as traditional CSV formats. The implementation facilitates addition of new descriptor implementations, from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services.

    CONCLUSIONS: Standardized QSAR data sets opens up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible dataset formation, solving the problems of defining which software components were used, their versions, and the case of multiple names for the same descriptor. This makes is easy to join, extend, combine data sets and also to work collectively. The presented Bioclipse plugins equip scientists with intuitive tools that make QSAR-ML widely available for the community.

    Fulltekst (pdf)
    fulltext
  • 87.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Maastricht University.
    Hammerling, Ulf
    National Food Administration, Sweden.
    Dencker, Lennart
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Grafström, Roland
    Karolinska Institutet.
    A novel infrastructure for chemical safety predictions with focus on human health2012Inngår i: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 211, nr Supplm, s. S59-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A major objective of Computational Toxicology is to provide reliable and useful estimates in silico of (potentially) harmful actions of chemicals in humans. Predictive models are commonly based on in vitro and in vivo data, and aims at supporting risk assessment in various areas, including the environmental protection, food, and pharmaceutical sectors. The field is however hampered by the lack of standards, access to high quality data, validated predictive models, as well as means to connect toxicity data to genomics data.

    We present a framework and roadmap for a novel public infrastructure for predictive computational toxicology and chemical safety assessment, consisting of: (1) a repository capable of aggregating high quality toxicity data with gene expression data, (2) a repository where scientists can share and download predictive models for chemical safety, and (3) a user-friendly platform which makes the services and resources accessible for the scientific community. Databases under the framework will adhere to open standards and use standardized open exchange formats in order to interoperate with emerging international initiatives, such as the FP7-funded OpenTox and ToxBank projects.

    The infrastructure will strengthen and facilitate already ongoing activities within in silico toxicology, open up new possibilities for incorporating genomics data in chemicals safety modeling (toxicogenomics), as well as deepen the exploitation of signal transduction networks. The initiative will lay the foundation needed to boost decision support in risk assessment in a wide range of fields, including drug discovery, food safety, as well as agricultural and ecological safety assessment.

  • 88.
    Svensson, Fredrik
    et al.
    Univ Cambridge, Ctr Mol Informat, Dept Chem, Lensfield Rd, Cambridge CB2 1EW, England; IOTA Pharmaceut, St Johns Innovat Ctr, Cowley Rd, Cambridge CB4 0WS, England.
    Aniceto, Natalia
    Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K..
    Norinder, Ulf
    Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden; Department of Computer and Systems Sciences , Stockholm University, Box 7003, SE-164 07 Kista, Sweden.
    Cortes-Ciriano, Isidro
    Centre for Molecular Informatics, Department of Chemistry , University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K..
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Carlsson, Lars
    Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden; Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K..
    Bender, Andreas
    Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K..
    Conformal Regression for Quantitative Structure-Activity Relationship Modeling-Quantifying Prediction Uncertainty2018Inngår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 58, nr 5, s. 1132-1140Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the outputted prediction intervals to create as efficient (i.e. narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges and the different approaches were evaluated on 29 publicly available datasets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals. This approach afforded an average prediction range of 1.65 pIC50 units at the 80 % confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.

  • 89. Svensson, Fredrik
    et al.
    Aniceto, Natalia
    Norinder, Ulf
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Swetox, Karolinska Institutet, Sweden.
    Cortes-Ciriano, Isidro
    Spjuth, Ola
    Carlsson, Lars
    Bender, Andreas
    Conformal Regression for Quantitative Structure-Activity Relationship Modeling-Quantifying Prediction Uncertainty2018Inngår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 58, nr 5, s. 1132-1140Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.

  • 90.
    Toor, Salman
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    Lindberg, Mathias
    Fällman, Ingemar
    Vallin, Andreas
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Mohill, Olof
    Freyhult, Pontus
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Nilsson, Linus
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Agback, Martin
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Viklund, Lars
    Zazzi, Henric
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Capuccini, Marco
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    Möller, Joakim
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Murtagh, Donal
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Hellander, Andreas
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    SNIC Science Cloud (SSC): A national-scale cloud infrastructure for Swedish academia2017Inngår i: Proc. 13th International Conference on e-Science, Los Alamitos, CA: IEEE Computer Society, 2017, s. 219-227Konferansepaper (Fagfellevurdert)
  • 91.
    Toor, Salman
    et al.
    Uppsala universitet, Avdelningen för beräkningsvetenskap.
    Lindberg, Mathias
    Fällman, Ingemar
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N). Swedish National Infrastructure for Computing (SNIC), Uppsala, Sweden.
    Vallin, Andreas
    Uppsala universitet, Institutionen för informationsteknologi.
    Mohill, Olof
    Freyhult, Pontus
    Uppsala universitet, Institutionen för informationsteknologi.
    Nilsson, Linus
    Uppsala universitet, Institutionen för informationsteknologi.
    Agback, Martin
    Uppsala universitet, Institutionen för informationsteknologi.
    Viklund, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N). ∗Swedish National Infrastructure for Computing (SNIC), Uppsala, Sweden.
    Zazzi, Henric
    Spjuth, Ola
    Uppsala universitet, Institutionen för farmaceutisk biovetenskap.
    Capuccini, Marco
    Uppsala universitet, Avdelningen för beräkningsvetenskap.
    Möller, Joakim
    Uppsala universitet, Institutionen för farmaceutisk biovetenskap.
    Murtagh, Donal
    Uppsala universitet, Institutionen för farmaceutisk biovetenskap.
    Hellander, Andreas
    Uppsala universitet, Avdelningen för beräkningsvetenskap.
    SNIC Science Cloud (SSC): A national-scale cloud infrastructure for Swedish academia2017Inngår i: Proceedings 13th International Conference on e-Science: 24–27 October 2017 Auckland, New Zealand, Los Alamitos, CA: IEEE Computer Society , 2017, s. 219-227Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The cloud computing paradigm have fundamentally changed the way computational resources are being offered. Although the number of large-scale providers in academia is still relatively small, there is a rapidly increasing interest and adoption of cloud Infrastructure-as-a-Service in the scientific community. The added flexibility in how applications can be implemented compared to traditional batch computing systems is one of the key success factors for the paradigm, and scientific cloud computing promises to increase adoption of simulation and data analysis in scientific communities not traditionally users of large scale e-Infrastructure, the so called ”long tail of science”. In 2014, the Swedish National Infrastructure for Computing (SNIC) initiated a project to investigate the cost and constraints of offering cloud infrastructure for Swedish academia. The aim was to build a platform where academics could evaluate cloud computing for their use-cases. SNIC Science Cloud (SSC) has since then evolved into a national-scale cloud infrastructure based on three geographically distributed regions. In this article we present the SSC vision, architectural details and user stories. We summarize the experiences gained from running a nationalscale cloud facility into ”ten simple rules” for starting up a science cloud project based on OpenStack. We also highlight some key areas that require careful attention in order to offer cloud infrastructure for ubiquitous academic needs and in particular scientific workloads.

  • 92.
    Torabi Moghadam, Behrooz
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Holm, Marcus
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Carlsson, Lars
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Scaling predictive modeling in drug development with cloud computing2015Inngår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 55, s. 19-25Artikkel i tidsskrift (Fagfellevurdert)
  • 93. van Rijswijk, Merlijn
    et al.
    Beirnaert, Charlie
    Caron, Christophe
    Cascante, Marta
    Dominguez, Victoria
    Dunn, Warwick B
    Ebbels, Timothy M D
    Giacomoni, Franck
    Gonzalez-Beltran, Alejandra
    Hankemeier, Thomas
    Haug, Kenneth
    Izquierdo-Garcia, Jose L
    Jimenez, Rafael C
    Jourdan, Fabien
    Kale, Namrata
    Klapa, Maria I
    Kohlbacher, Oliver
    Koort, Kairi
    Kultima, Kim
    Le Corguillé, Gildas
    Moschonas, Nicholas K
    Neumann, Steffen
    O'Donovan, Claire
    Reczko, Martin
    Rocca-Serra, Philippe
    Rosato, Antonio
    Salek, Reza M
    Sansone, Susanna-Assunta
    Satagopam, Venkata
    Schober, Daniel
    Shimmo, Ruth
    Spicer, Rachel A
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Thévenot, Etienne A
    Viant, Mark R
    Weber, Ralf J M
    Willighagen, Egon L
    Zanetti, Gianluigi
    Steinbeck, Christoph
    The future of metabolomics in ELIXIR.2017Inngår i: F1000 Research, E-ISSN 2046-1402, Vol. 6, artikkel-id ELIXIR-1649Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the "Future of metabolomics in ELIXIR" was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established metabolite identification as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases.

  • 94.
    Wagener, Johannes
    et al.
    Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. 279-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND:Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use.

    RESULTS:We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics.

    CONCLUSION:XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.

  • 95.
    Wikberg, Jarl
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lapins, Maris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Engkvist, Ola
    AstraZeneca R&D, Sweden.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Introduction to Pharmaceutical Bioinformatics2010 (oppl. 2)Bok (Annet vitenskapelig)
  • 96.
    Wikberg, Jarl
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lapins, Maris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Chemoinformatics taking Biology into Account: Proteochemometrics2012Inngår i: Computational Approaches in Cheminformatics and Bioinformatics / [ed] Rajarshi Guha and Andreas Bender, Hoboken, N.J.: John Wiley & Sons, 2012, s. Chapter 3-Kapittel i bok, del av antologi (Annet vitenskapelig)
  • 97. Williams, Antony J
    et al.
    Ekins, Sean
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon L
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Accessing, using, and creating chemical property databases for computational toxicology modeling2012Inngår i: Methods in molecular biology (Clifton, N.J.), ISSN 1940-6029, Vol. 929, s. 221-241Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Toxicity data is expensive to generate, is increasingly seen as precompetitive, and is frequently used for the generation of computational models in a discipline known as computational toxicology. Repositories of chemical property data are valuable for supporting computational toxicologists by providing access to data regarding potential toxicity issues with compounds as well as for the purpose of building structure-toxicity relationships and associated prediction models. These relationships use mathematical, statistical, and modeling computational approaches and can be used to understand the mechanisms by which chemicals cause harm and, ultimately, enable prediction of adverse effects of these chemicals to human health and/or the environment. Such approaches are of value as they offer an opportunity to prioritize chemicals for testing. An increasing amount of data used by computational toxicologists is being published into the public domain and, in parallel, there is a greater availability of Open Source software for the generation of computational models. This chapter provides an overview of the types of data and software available and how these may be used to produce predictive toxicology models for the community.

  • 98.
    Willighagen, Egon
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Andersson, Annsofie
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lampa, Samuel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lapins, Maris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Linking the Resource Description Framework to cheminformatics and proteochemometrics2011Inngår i: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 2, nr Suppl 1, s. 6-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND :

    Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation.

    RESULTS :

    The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and Ki values are modeled for a number of biological targets using data from the ChEMBL database.

    CONCLUSIONS :

    We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.

  • 99.
    Willighagen, Egon
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Jeliazkova, Nina
    Ideaconsult Ltd.
    Hardy, Barry
    Douglas Connect.
    Grafström, Roland
    Karolinska Institutet.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Computational toxicology using the OpenTox application programming interface and Bioclipse2011Inngår i: BMC Research Notes, ISSN 1756-0500, E-ISSN 1756-0500, Vol. 4, nr 1, s. 487-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications.

    Findings: This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources.

    Conclusions: A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.

  • 100.
    Willighagen, Egon L.
    et al.
    Maastricht Univ, Dept Bioinformat, NUTRIM, BiGCaT, NL-6200 MD Maastricht, Netherlands..
    Mayfield, John W.
    NextMove Software Ltd, Cambridge CB4 0EY, England..
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Berg, Arvid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Carlsson, Lars
    AstraZeneca, Innovat Med & Early Dev, Quantitat Biol, Molndal, Sweden..
    Jeliazkova, Nina
    Ideaconsult Ltd, A Kanchev 4, Sofia 1000, Bulgaria..
    Kuhn, Stefan
    Univ Leicester, Dept Informat, Leicester, Leics, England..
    Pluskal, Tomas
    Whitehead Inst Biomed Res, 455 Main St, Cambridge, MA 02142 USA..
    Rojas-Cherto, Miquel
    Quim Clin Aplicada, Amposta 43870, Spain..
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. 4 Hanway Pl, London W1T 1HD, England..
    Torrance, Gilleain
    4 Hanway Pl, London W1T 1HD, England..
    Evelo, Chris T.
    Maastricht Univ, Dept Bioinformat, NUTRIM, BiGCaT, NL-6200 MD Maastricht, Netherlands..
    Guha, Rajarshi
    Natl Ctr Adv Translat Sci, 9800 Med Ctr Dr, Rockville, MD 20850 USA..
    Steinbeck, Christoph
    Friedrich Schiller Univ, Inst Inorgan & Analyt Chem, Lessingstr 8, D-07743 Jena, Germany..
    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching2017Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, artikkel-id 33Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms.

    Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism.

    Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software.

    Fulltekst (pdf)
    fulltext
123 51 - 100 of 101
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf