Change search
Refine search result
1234567 1 - 50 of 590
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Oldest first
  • Newest first
Select all
  • 1.
    Abbaszadeh Shahri, Abbas
    KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering. Islamic Azad University.
    An Optimized Artificial Neural Network Structure to Predict Clay Sensitivity in a High Landslide Prone Area Using Piezocone Penetration Test (CPTu) Data: A Case Study in Southwest of Sweden2016In: Geotechnical and Geological Engineering, ISSN 0960-3182, E-ISSN 1573-1529, 1-14 p.Article in journal (Refereed)
    Abstract [en]

    Application of artificial neural networks (ANN) in various aspects of geotechnical engineering problems such as site characterization due to have difficulty to solve or interrupt through conventional approaches has demonstrated some degree of success. In the current paper a developed and optimized five layer feed-forward back-propagation neural network with 4-4-4-3-1 topology, network error of 0.00201 and R2 = 0.941 under the conjugate gradient descent ANN training algorithm was introduce to predict the clay sensitivity parameter in a specified area in southwest of Sweden. The close relation of this parameter to occurred landslides in Sweden was the main reason why this study is focused on. For this purpose, the information of 70 piezocone penetration test (CPTu) points was used to model the variations of clay sensitivity and the influences of direct or indirect related parameters to CPTu has been taken into account and discussed in detail. Applied operation process to find the optimized ANN model using various training algorithms as well as different activation functions was the main advantage of this paper. The performance and feasibility of proposed optimized model has been examined and evaluated using various statistical and analytical criteria as well as regression analyses and then compared to in situ field tests and laboratory investigation results. The sensitivity analysis of this study showed that the depth and pore pressure are the two most and cone tip resistance is the least effective factor on prediction of clay sensitivity.

  • 2. Aberer, André
    et al.
    Stamatakis, Alexis
    Ronquist, Fredrik
    Swedish Museum of Natural History, Department of Bioinformatics and Genetics.
    An efficient independence sampler for updating branches in Bayesian Markov chain Monte Carlo sampling of phylogenetic trees2016In: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 65, no 1, 161-176 p.Article in journal (Refereed)
  • 3.
    Abraham, Mark James
    et al.
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Murtola, T.
    Schulz, R.
    Páll, Szilárd
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Smith, J. C.
    Hess, Berk
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers2015In: SoftwareX, ISSN 2352-7110, Vol. 1-2, 19-25 p.Article in journal (Refereed)
    Abstract [en]

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  • 4.
    Aftab, Obaid
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Fryknäs, Mårten
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Hammerling, Ulf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Gustafsson, Mats
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Detection of cell aggregation and altered cell viability by automated label-free video microscopy: A promising alternative to endpoint viability assays in high throughput screening2015In: Journal of Biomolecular Screening, ISSN 1087-0571, E-ISSN 1552-454X, Vol. 20, no 3, 372-381 p.Article in journal (Refereed)
    Abstract [en]

    Automated phase-contrast video microscopy now makes it feasible to monitor a high-throughput (HT) screening experiment in a 384-well microtiter plate format by collecting one time-lapse video per well. Being a very cost-effective and label-free monitoring method, its potential as an alternative to cell viability assays was evaluated. Three simple morphology feature extraction and comparison algorithms were developed and implemented for analysis of differentially time-evolving morphologies (DTEMs) monitored in phase-contrast microscopy videos. The most promising layout, pixel histogram hierarchy comparison (PHHC), was able to detect several compounds that did not induce any significant change in cell viability, but made the cell population appear as spheroidal cell aggregates. According to recent reports, all these compounds seem to be involved in inhibition of platelet-derived growth factor receptor (PDGFR) signaling. Thus, automated quantification of DTEM (AQDTEM) holds strong promise as an alternative or complement to viability assays in HT in vitro screening of chemical compounds.

  • 5.
    Agarwal, Prasoon
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Hematology and Immunology.
    Regulation of Gene Expression in Multiple Myeloma Cells and Normal Fibroblasts: Integrative Bioinformatic and Experimental Approaches2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work presented in this thesis applies integrative genomic and experimental approaches to investigate mechanisms involved in regulation of gene expression in the context of disease and normal cell biology.

    In papers I and II, we have explored the role of epigenetic regulation of gene expression in multiple myeloma (MM). By using a bioinformatic approach we identified the Polycomb repressive complex 2 (PRC2) to be a common denominator for the underexpressed gene signature in MM. By using inhibitors of the PRC2 we showed an activation of the genes silenced by H3K27me3 and a reduction in the tumor load and increased overall survival in the in vivo 5TMM model. Using ChIP-sequencing we defined the distribution of H3K27me3 and H3K4me3 marks in MM patients cells. In an integrated bioinformatic approach, the H3K27me3-associated genes significantly correlated to under-expression in patients with less favorable survival. Thus, our data indicates the presence of a common under-expressed gene profile and provides a rationale for implementing new therapies focusing on epigenetic alterations in MM.

    In paper III we address the existence of a small cell population in MM presenting with differential tumorigenic properties in the 5T33MM murine model. We report that the predominant population of CD138+ cells had higher engraftment potential, higher clonogenic growth, whereas the CD138- MM cells presented with less mature phenotype and higher drug resistance. Our findings suggest that while designing treatment regimes for MM, both the cellpopulations must be targeted.

    In paper IV we have studied the general mechanism of differential gene expression regulation by CGGBP1 in response to growth signals in normal human fibroblasts. We found that CGGBP1 binding affects global gene expression by RNA Polymerase II. This is mediated by Alu RNAdependentinhibition of RNA Polymerase II. In presence of growth signals CGGBP1 is retained in the nuclei and exhibits enhanced Alu binding thus inhibiting RNA Polymerase III binding on Alus. Hence we suggest a mechanism by which CGGBP1 orchestrates Alu RNA-mediated regulation of RNA Polymerase II. This thesis provides new insights for using integrative bioinformatic approaches to decipher gene expression regulation mechanisms in MM and in normal cells.

  • 6. Aidas, Kestutis
    et al.
    Angeli, Celestino
    Bak, Keld L.
    Bakken, Vebjorn
    Bast, Radovan
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology.
    Boman, Linus
    Christiansen, Ove
    Cimiraglia, Renzo
    Coriani, Sonia
    Dahle, Pal
    Dalskov, Erik K.
    Ekstrom, Ulf
    Enevoldsen, Thomas
    Eriksen, Janus J.
    Ettenhuber, Patrick
    Fernandez, Berta
    Ferrighi, Lara
    Fliegl, Heike
    Frediani, Luca
    Hald, Kasper
    Halkier, Asger
    Hattig, Christof
    Heiberg, Hanne
    Helgaker, Trygve
    Hennum, Alf Christian
    Hettema, Hinne
    Hjertenaes, Eirik
    Host, Stinne
    Hoyvik, Ida-Marie
    Iozzi, Maria Francesca
    Jansik, Branislav
    Jensen, Hans Jorgen Aa.
    Jonsson, Dan
    Jorgensen, Poul
    Kauczor, Joanna
    Kirpekar, Sheela
    Kjrgaard, Thomas
    Klopper, Wim
    Knecht, Stefan
    Kobayashi, Rika
    Koch, Henrik
    Kongsted, Jacob
    Krapp, Andreas
    Kristensen, Kasper
    Ligabue, Andrea
    Lutnaes, Ola B.
    Melo, Juan I.
    Mikkelsen, Kurt V.
    Myhre, Rolf H.
    Neiss, Christian
    Nielsen, Christian B.
    Norman, Patrick
    Olsen, Jeppe
    Olsen, Jogvan Magnus H.
    Osted, Anders
    Packer, Martin J.
    Pawlowski, Filip
    Pedersen, Thomas B.
    Provasi, Patricio F.
    Reine, Simen
    Rinkevicius, Zilvinas
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Ruden, Torgeir A.
    Ruud, Kenneth
    Rybkin, Vladimir V.
    Salek, Pawel
    Samson, Claire C. M.
    de Meras, Alfredo Sanchez
    Saue, Trond
    Sauer, Stephan P. A.
    Schimmelpfennig, Bernd
    Sneskov, Kristian
    Steindal, Arnfinn H.
    Sylvester-Hvid, Kristian O.
    Taylor, Peter R.
    Teale, Andrew M.
    Tellgren, Erik I.
    Tew, David P.
    Thorvaldsen, Andreas J.
    Thogersen, Lea
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology.
    Watson, Mark A.
    Wilson, David J. D.
    Ziolkowski, Marcin
    Ågren, Hans
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology.
    The Dalton quantum chemistry program system2014In: Wiley Interdisciplinary Reviews. Computational Molecular Science, ISSN 1759-0876, Vol. 4, no 3, 269-284 p.Article in journal (Refereed)
    Abstract [en]

    Dalton is a powerful general-purpose program system for the study of molecular electronic structure at the Hartree-Fock, Kohn-Sham, multiconfigurational self-consistent-field, MOller-Plesset, configuration-interaction, and coupled-cluster levels of theory. Apart from the total energy, a wide variety of molecular properties may be calculated using these electronic-structure models. Molecular gradients and Hessians are available for geometry optimizations, molecular dynamics, and vibrational studies, whereas magnetic resonance and optical activity can be studied in a gauge-origin-invariant manner. Frequency-dependent molecular properties can be calculated using linear, quadratic, and cubic response theory. A large number of singlet and triplet perturbation operators are available for the study of one-, two-, and three-photon processes. Environmental effects may be included using various dielectric-medium and quantum-mechanics/molecular-mechanics models. Large molecules may be studied using linear-scaling and massively parallel algorithms. Dalton is distributed at no cost from for a number of UNIX platforms.

  • 7.
    Ajawatanawong, Pravech
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    Atkinson, Gemma C.
    Watson-Haigh, Nathan S.
    MacKenzie, Bryony
    Baldauf, Sandra L.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments2012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, W340-W347 p.Article in journal (Refereed)
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  • 8. Alger, Ingela
    et al.
    Weibull, Jörgen W.
    KTH, School of Engineering Sciences (SCI), Mathematics (Dept.).
    A generalization of Hamilton's rule-Love others how much?2012In: Journal of Theoretical Biology, ISSN 0022-5193, E-ISSN 1095-8541, Vol. 299, 42-54 p.Article in journal (Refereed)
    Abstract [en]

    According to Hamilton's (1964a, b) rule, a costly action will be undertaken if its fitness cost to the actor falls short of the discounted benefit to the recipient, where the discount factor is Wright's index of relatedness between the two. We propose a generalization of this rule, and show that if evolution operates at the level of behavior rules, rather than directly at the level of actions, evolution will select behavior rules that induce a degree of cooperation that may differ from that predicted by Hamilton's rule as applied to actions. In social dilemmas there will be less (more) cooperation than under Hamilton's rule if the actions are strategic substitutes (complements). Our approach is based on natural selection, defined in terms of personal (direct) fitness, and applies to a wide range of pairwise interactions.

  • 9.
    Ali, Raja Hashim
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    From genomes to post-processing of Bayesian inference of phylogeny2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Life is extremely complex and amazingly diverse; it has taken billions of years of evolution to attain the level of complexity we observe in nature now and ranges from single-celled prokaryotes to multi-cellular human beings. With availability of molecular sequence data, algorithms inferring homology and gene families have emerged and similarity in gene content between two genes has been the major signal utilized for homology inference. Recently there has been a significant rise in number of species with fully sequenced genome, which provides an opportunity to investigate and infer homologs with greater accuracy and in a more informed way. Phylogeny analysis explains the relationship between member genes of a gene family in a simple, graphical and plausible way using a tree representation. Bayesian phylogenetic inference is a probabilistic method used to infer gene phylogenies and posteriors of other evolutionary parameters. Markov chain Monte Carlo (MCMC) algorithm, in particular using Metropolis-Hastings sampling scheme, is the most commonly employed algorithm to determine evolutionary history of genes. There are many softwares available that process results from each MCMC run, and explore the parameter posterior but there is a need for interactive software that can analyse both discrete and real-valued parameters, and which has convergence assessment and burnin estimation diagnostics specifically designed for Bayesian phylogenetic inference.

    In this thesis, a synteny-aware approach for gene homology inference, called GenFamClust (GFC), is proposed that uses gene content and gene order conservation to infer homology. The feature which distinguishes GFC from earlier homology inference methods is that local synteny has been combined with gene similarity to infer homologs, without inferring homologous regions. GFC was validated for accuracy on a simulated dataset. Gene families were computed by applying clustering algorithms on homologs inferred from GFC, and compared for accuracy, dependence and similarity with gene families inferred from other popular gene family inference methods on a eukaryotic dataset. Gene families in fungi obtained from GFC were evaluated against pillars from Yeast Gene Order Browser. Genome-wide gene families for some eukaryotic species are computed using this approach.

    Another topic focused in this thesis is the processing of MCMC traces for Bayesian phylogenetics inference. We introduce a new software VMCMC which simplifies post-processing of MCMC traces. VMCMC can be used both as a GUI-based application and as a convenient command-line tool. VMCMC supports interactive exploration, is suitable for automated pipelines and can handle both real-valued and discrete parameters observed in a MCMC trace. We propose and implement joint burnin estimators that are specifically applicable to Bayesian phylogenetics inference. These methods have been compared for similarity with some other popular convergence diagnostics. We show that Bayesian phylogenetic inference and VMCMC can be applied to infer valuable evolutionary information for a biological case – the evolutionary history of FERM domain.

  • 10.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Bark, Mikael
    KTH, School of Information and Communication Technology (ICT).
    Miró, Jorge
    KTH, School of Information and Communication Technology (ICT).
    Muhammad, Sayyed Auwn
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Sjöstrand, J.
    Zubair, Syed M.
    KTH, School of Electrical Engineering (EES), Communication Networks. University of Balochistan, Pakistan.
    Abbas, R. M.
    Arvestad, L.
    VMCMC: A graphical and statistical analysis tool for Markov chain Monte Carlo traces2017In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, no 1, 97Article in journal (Refereed)
    Abstract [en]

    Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters. Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines. Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.

  • 11.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Muhammad, Sayyed Auwn
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Khan, Mehmodd Alam
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    Stockholms universitet.
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, S12- p.Article in journal (Refereed)
    Abstract [en]

    Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 12. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, ISSN 1471-2105, Vol. 14, no Suppl,15, S12- p.Article in journal (Refereed)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 13.
    Al-Jaff, Mohammed
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Sandström, Eric
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Grabherr, Manfred
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala Univ, Bioinformat Infrastruct Life Sci, S-75123 Uppsala, Sweden..
    microTaboo: a general and practical solution to the k-disjoint problem2017In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, 228Article in journal (Refereed)
    Abstract [en]

    Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable.

    Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe-to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining.

    Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment-and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at:https://MohammedAlJaff.github.io/microTaboo

  • 14.
    Alneberg, Johannes
    et al.
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
    Bjarnason, Brynjar Smári
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
    de Bruijn, Ino
    Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm, Sweden.
    Schirmer, Melanie
    School of Engineering, University of Glasgow, Glasgow, UK.
    Quick, Joshua
    Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK; National Institute for Health Research Surgical Reconstruction (NIHR) Surgical Reconstruction and Microbiology Research Centre, University of Birmingham, UK.
    Ijaz, Umer Z.
    School of Engineering, University of Glasgow, Glasgow, UK.
    Lahti, Leo
    Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland; Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.
    Loman, Nicholas J
    Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK.
    Andersson, Anders F
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnoloy, Division of Gene Technology, Stockholm, Sweden.
    Quince, Christopher
    School of Engineering, University of Glasgow, Glasgow, UK.
    Binning metagenomic contigs by coverage and composition2014In: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 11, no 11, 1144-6 p.Article in journal (Refereed)
    Abstract [en]

    Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.

  • 15.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Engkvist, Ola
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Carlsson, Lars
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Noeske, Tobias
    Ligand-Based Target Prediction with Signature Fingerprints2014In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, 2647-2653 p.Article in journal (Refereed)
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

  • 16.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lampa, Samuel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale ligand-based predictive modelling using support vector machines2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, 39Article in journal (Refereed)
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

  • 17.
    Ameur, Adam
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    A Bioinformatics Study of Human Transcriptional Regulation2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Regulation of transcription is a central mechanism in all living cells that now can be investigated with high-throughput technologies. Data produced from such experiments give new insights to how transcription factors (TFs) coordinate the gene transcription and thereby regulate the amounts of proteins produced. These studies are also important from a medical perspective since TF proteins are often involved in disease. To learn more about transcriptional regulation, we have developed strategies for analysis of data from microarray and massively parallel sequencing (MPS) experiments.

    Our computational results consist of methods to handle the steadily increasing amount of data from high-throughput technologies. Microarray data analysis tools have been assembled in the LCB-Data Warehouse (LCB-DWH) (paper I), and other analysis strategies have been developed for MPS data (paper V). We have also developed a de novo motif search algorithm called BCRANK (paper IV).

    The analysis has lead to interesting biological findings in human liver cells (papers II-V). The investigated TFs appeared to bind at several thousand sites in the genome, that we have identified at base pair resolution. The investigated histone modifications are mainly found downstream of transcription start sites, and correlated to transcriptional activity. These histone marks are frequently found for pairs of genes in a bidirectional conformation. Our results suggest that a TF can bind in the shared promoter of two genes and regulate both of them.

    From a medical perspective, the genes bound by the investigated TFs are candidates to be involved in metabolic disorders. Moreover, we have developed a new strategy to detect single nucleotide polymorphisms (SNPs) that disrupt the binding of a TF (paper IV). We further demonstrated that SNPs can affect transcription in the immediate vicinity. Ultimately, our method may prove helpful to find disease-causing regulatory SNPs.

  • 18.
    Amrein, Beat Anton
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology.
    Steffen-Munsberg, Fabian
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Szeler, Ireneusz
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology.
    Purg, Miha
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Kulkarni, Yashraj
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Kamerlin, Shina Caroline Lynn
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    CADEE: Computer-Aided Directed Evolution of Enzymes2017In: IUCrJ, ISSN 0972-6918, E-ISSN 2052-2525, Vol. 4, no 1, 50-64 p.Article in journal (Refereed)
    Abstract [en]

    The tremendous interest in enzymes as biocatalysts has led to extensive work in enzyme engineering, as well as associated methodology development. Here, a new framework for computer-aided directed evolution of enzymes (CADEE) is presented which allows a drastic reduction in the time necessary to prepare and analyze in silico semi-automated directed evolution of enzymes. A pedagogical example of the application of CADEE to a real biological system is also presented in order to illustrate the CADEE workflow.

  • 19.
    Anders, Patrizia
    University of Skövde, School of Humanities and Informatics.
    A bioinformaticians view on the evolution of smell perception2006Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Background:

    The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates.

    Results:

    Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a

    high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods.

    Conclusions:

    The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction.

  • 20.
    Andersson, Malin
    University of Skövde, Department of Computer Science.
    A method for identification of putatively co-regulated genes2002Independent thesis Advanced level (degree of Master (One Year))Student thesis
    Abstract [en]

    The genomes of several organisms have been sequenced and the need for methods to analyse the data is growing. In this project a method is described that tries to identify co-regulated genes. The method identifies transcription factor binding sites, documented in TRANSFAC, in the non-coding regions of genes. The algorithm counts the number of common binding sites and the number of unique binding sites for each pair of genes and decides if the genes are co-regulated. The result of the method is compared with the correlation between the gene expression patterns of the genes. The method is tested on 21 gene pairs from the genome of Saccharomyces cerevisiae. The algorithm first identified binding sites from all organisms. The accuracy of the program was very low in this case. When the algorithm was modified to only identify binding sites found in plants the accuracy was much improved, from 52% to 76% correct predictions.

  • 21.
    Andersson, Samuel A.
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Motif Yggdrasil: Sampling from a tree mixture model2006In: Research In Computational Molecular Biology, Proceedings / [ed] Apostolico, A; Guerra, C; Istrail, S; Pevzner, P; Waterman, M, 2006, Vol. 3909, 458-472 p.Conference paper (Refereed)
    Abstract [en]

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.

  • 22.
    Andrade, Jorge
    KTH, School of Biotechnology (BIO), Gene Technology.
    Grid and High-Performance Computing for Applied Bioinformatics2007Doctoral thesis, comprehensive summary (Other scientific)
    Abstract [en]

    The beginning of the twenty-first century has been characterized by an explosion of biological information. The avalanche of data grows daily and arises as a consequence of advances in the fields of molecular biology and genomics and proteomics. The challenge for nowadays biologist lies in the de-codification of this huge and complex data, in order to achieve a better understanding of how our genes shape who we are, how our genome evolved, and how we function.

    Without the annotation and data mining, the information provided by for example high throughput genomic sequencing projects is not very useful. Bioinformatics is the application of computer science and technology to the management and analysis of biological data, in an effort to address biological questions. The work presented in this thesis has focused on the use of Grid and High Performance Computing for solving computationally expensive bioinformatics tasks, where, due to the very large amount of available data and the complexity of the tasks, new solutions are required for efficient data analysis and interpretation.

    Three major research topics are addressed; First, the use of grids for distributing the execution of sequence based proteomic analysis, its application in optimal epitope selection and in a proteome-wide effort to map the linear epitopes in the human proteome. Second, the application of grid technology in genetic association studies, which enabled the analysis of thousand of simulated genotypes, and finally the development and application of a economic based model for grid-job scheduling and resource administration.

    The applications of the grid based technology developed in the present investigation, results in successfully tagging and linking chromosomes regions in Alzheimer disease, proteome-wide mapping of the linear epitopes, and the development of a Market-Based Resource Allocation in Grid for Scientific Applications.

  • 23.
    Andrade, Jorge
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    Andersen, Malin
    KTH, School of Biotechnology (BIO), Gene Technology.
    Berglund, Lisa
    KTH, School of Biotechnology (BIO), Proteomics.
    Odeberg, Jacob
    KTH, School of Biotechnology (BIO), Gene Technology.
    Applications of grid computing in genetics and proteomics2007In: Applied Parallel Computing: State Of The Art In Scientific Computing / [ed] Kagstrom, B; Elmroth, E; Dongarra, J; Wasniewski, J, 2007, Vol. 4699, 791-798 p.Conference paper (Refereed)
    Abstract [en]

    The potential for Grid technologies in applied bioinformatics is largely unexplored. We have developed a model for solving computationally demanding bioinformatics tasks in distributed Grid environments, designed to ease the usability for scientists unfamiliar with Grid computing. With a script-based implementation that uses a strategy of temporary installations of databases and existing executables on remote nodes at submission, we propose a generic solution that do not rely on predefined Grid runtime environments and that can easily be adapted to other bioinformatics tasks suitable for parallelization. This implementation has been successfully applied to whole proteome sequence similarity analyses and to genome-wide genotype simulations, where computation time was reduced from years to weeks. We conclude that computational Grid technology is a useful resource for solving high compute tasks in genetics and proteomics using existing algorithms.

  • 24.
    Arvestad, Lars
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Aligning coding DNA in the presence of frame-shift errors1997In: Combinatorial Pattern Matching, 1997, 180-190 p.Conference paper (Other academic)
    Abstract [en]

    The problem of aligning two DNA sequences with respect to the fact that they are coding for proteins is discussed. Criteria for a good alignment of coding DNA, together with an algorithm that satisfies them, are presented. The algorithm is robust against frame-shifts and forgiving towards silent substitutions. The important choice of objective function is examined and several variants are proposed.

  • 25.
    Arvestad, Lars
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Berglund, Ann-Charlotte
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Sennblad, Bengt
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Bayesian gene/species tree reconciliation and orthology analysis using MCMC2003In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 19, i7-i15 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available.

    Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

  • 26.
    Arvestad, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Bruno, William
    Los Alamos National Laboratory.
    Estimation of Reversible Substitution Matrices from Multiple Pairs of Sequences1997In: Journal of Molecular Evolution, ISSN 0022-2844, E-ISSN 1432-1432, Vol. 45, no 6, 696-703 p.Article in journal (Refereed)
    Abstract [en]

    We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.

  • 27.
    Auffarth, Benjamin
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Kaplan, Bernhard
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Anders, Lansner
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Map formation in the olfactory bulb by axon guidance of olfactory neurons2011In: Frontiers in Systems Neuroscience, ISSN 1662-5137, Vol. 5, no 0Article in journal (Refereed)
    Abstract [en]

    The organization of representations in the brain has been observed to locally reflect subspaces of inputs that are relevant to behavioral or perceptual feature combinations, such as in areas receptive to lower and higher-order features in the visual system. The early olfactory system developed highly plastic mechanisms and convergent evidence indicates that projections from primary neurons converge onto the glomerular level of the olfactory bulb (OB) to form a code composed of continuous spatial zones that are differentially active for particular physico?-chemical feature combinations, some of which are known to trigger behavioral responses. In a model study of the early human olfactory system, we derive a glomerular organization based on a set of real-world,biologically-relevant stimuli, a distribution of receptors that respond each to a set of odorants of similar ranges of molecular properties, and a mechanism of axon guidance based on activity. Apart from demonstrating activity-dependent glomeruli formation and reproducing the relationship of glomerular recruitment with concentration, it is shown that glomerular responses reflect similarities of human odor category perceptions and that further, a spatial code provides a better correlation than a distributed population code. These results are consistent with evidence of functional compartmentalization in the OB and could suggest a function for the bulb in encoding of perceptual dimensions.

  • 28.
    Aurell, Erik
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). Aalto University, Finland.
    Innocenti, Nicolas
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). The Hebrew University of Jerusalem, Israel.
    Zhou, Hai-Jun
    State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.
    The bulk and the tail of minimal absent words in genome sequences2016In: Physical Biology, ISSN 1478-3967, E-ISSN 1478-3975, Vol. 13, no 2, 026004Article in journal (Refereed)
    Abstract [en]

    Minimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether the reason behind this phenomenon is statistical or reflects a biological mechanism, and what biological information is contained in absent words. % In this work we demonstrate that the bulk can be described by a probabilistic model of sampling words from random sequences, while the tail of long MAWs is of biological origin. We introduce the novel concept of a core of a minimal absent word, which are sequences present in the genome and closest to a given MAW. We show that in bacteria and yeast the cores of the longest MAWs, which exist in two or more copies, are located in highly conserved regions the most prominent example being ribosomal RNAs (rRNAs). We also show that while the distribution of the cores of long MAWs is roughly uniform over these genomes on a coarse-grained level, on a more detailed level it is strongly enhanced in 3' untranslated regions (UTRs) and, to a lesser extent, also in 5' UTRs. This indicates that MAWs and associated MAW cores correspond to fine-tuned evolutionary relationships, and suggest that they can be more widely used as markers for genomic complexity.

  • 29.
    Austin, Peter C.
    et al.
    Inst Clin Evaluat Sci, G106,2075 Bayview Ave, Toronto, ON M4N 3M5, Canada.;Univ Toronto, Inst Hlth Management Policy & Evaluat, Toronto, ON, Canada.;Sunnybrook Res Inst, Schulich Heart Res Program, Toronto, ON, Canada..
    Wagner, Philippe
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Medicinska och farmaceutiska vetenskapsområdet, centrumbildningar mm, Centre for Clinical Research, County of Västmanland. Lund Univ, Unit Social Epidemiol, Fac Med, Malmo, Sweden..
    Merlo, Juan
    Lund Univ, Unit Social Epidemiol, Fac Med, Malmo, Sweden.;Region Skane, Ctr Primary Hlth Care Res, Malmo, Sweden..
    The median hazard ratio: a useful measure of variance and general contextual effects in multilevel survival analysis2017In: Statistics in Medicine, ISSN 0277-6715, E-ISSN 1097-0258, Vol. 36, no 6, 928-938 p.Article in journal (Refereed)
    Abstract [en]

    Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis.

  • 30.
    Ballber Torres, Nuria
    et al.
    University of Politecn Cataluna, Spain.
    Altafini, Claudio
    Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, Faculty of Science & Engineering.
    Drug combinatorics and side effect estimation on the signed human drug-target network2016In: BMC Systems Biology, ISSN 1752-0509, E-ISSN 1752-0509, Vol. 10, no 74Article in journal (Refereed)
    Abstract [en]

    Background: The mode of action of a drug on its targets can often be classified as being positive (activator, potentiator, agonist, etc.) or negative (inhibitor, blocker, antagonist, etc.). The signed edges of a drug-target network can be used to investigate the combined mechanisms of action of multiple drugs on the ensemble of common targets. Results: In this paper it is shown that for the signed human drug-target network the majority of drug pairs tend to have synergistic effects on the common targets, i.e., drug pairs tend to have modes of action with the same sign on most of the shared targets, especially for the principal pharmacological targets of a drug. Methods are proposed to compute this synergism, as well as to estimate the influence of the drugs on the side effect of another drug. Conclusions: Enriching a drug-target network with information of functional nature like the sign of the interactions allows to explore in a systematic way a series of network properties of key importance in the context of computational drug combinatorics.

  • 31.
    Baltzer, Nicholas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Stockholm County, Sweden.
    Sundström, Karin
    Karolinska Inst, Dept Lab Med, Stockholm, Stockholm Count, Sweden..
    Nygård, Jan F.
    Canc Registry Norway, Dept Registry Informat, Oslo, Oslo County, Norway..
    Dillner, Joakim
    Karolinska Inst, Dept Lab Med, Stockholm, Stockholm Count, Sweden..
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics. Polish Acad Sci, Inst Comp Sci, Warsaw, Warsaw County, Poland..
    Risk stratification in cervical cancer screening by complete screening history: Applying bioinformatics to a general screening population2017In: International Journal of Cancer, ISSN 0020-7136, E-ISSN 1097-0215, Vol. 141, no 1, 200-209 p.Article in journal (Refereed)
    Abstract [en]

    Women screened for cervical cancer in Sweden are currently treated under a one-size-fits-all programme, which has been successful in reducing the incidence of cervical cancer but does not use all of the participants' available medical information. This study aimed to use women's complete cervical screening histories to identify diagnostic patterns that may indicate an increased risk of developing cervical cancer. A nationwide case-control study was performed where cervical cancer screening data from 125,476 women with a maximum follow-up of 10 years were evaluated for patterns of SNOMED diagnoses. The cancer development risk was estimated for a number of different screening history patterns and expressed as Odds Ratios (OR), with a history of 4 benign cervical tests as reference, using logistic regression. The overall performance of the model was moderate (64% accuracy, 71% area under curve) with 61-62% of the study population showing no specific patterns associated with risk. However, predictions for high-risk groups as defined by screening history patterns were highly discriminatory with ORs ranging from 8 to 36. The model for computing risk performed consistently across different screening history lengths, and several patterns predicted cancer outcomes. The results show the presence of risk-increasing and risk-decreasing factors in the screening history. Thus it is feasible to identify subgroups based on their complete screening histories. Several high-risk subgroups identified might benefit from an increased screening density. Some low-risk subgroups identified could likely have a moderately reduced screening density without additional risk.

  • 32.
    Bartoszek, Krzysztof
    Gdansk University of Technology.
    The Bootstrap and Other Methods of Testing Phylogenetic Trees2007In: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2007, Vol. 12, 103-108 p.Conference paper (Refereed)
    Abstract [en]

    The final step of a phylogenetic analysis is the test of the generated tree. This is not a easy task for which there is an obvious methodology because we do not know the full probabilistic model of evolution. A number of methods have been proposed but there is a wide debate concerning the interpretations of the results they produce.

  • 33.
    Bartoszek, Krzysztof
    Gdansk University of Technology.
    A Graph – String Model of Gene Assembly in Ciliates2006In: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2006, Vol. 10, 521-534 p.Conference paper (Refereed)
    Abstract [en]

    The ciliates are a family of unicellular organisms that characterize themselves by having two types of nuclei, micro - and macronuclei. During cell mating the genetic material must change from the micronuclei to the macronuclei form. The paper summarises a formal model for this change. The model, which is described in recent works, is based on strings and graphs. It shows that inside the cell complex computational operations have to take place.

  • 34.
    Bartoszek, Krzysztof
    et al.
    Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg.
    Liò, Pietro
    University of Cambridge.
    Sorathiya, Anil
    University of Cambridge.
    Influenza differentiation and evolution2010In: Acta Physica Polonica B Proceedings Supplement, 2010, Vol. 3, 417-452 p.Conference paper (Refereed)
    Abstract [en]

    The aim of the study is to do a very wide analysis of HA, NA and M influenza gene segments to find short nucleotide regions,which differentiate between strains (i.e. H1, H2, ... e.t.c.), hosts, geographic regions, time when sequence was found and combination of time and region using a simple methodology. Finding regions  differentiating between strains has as its goal the construction of a Luminex microarray which will allow quick and efficient strain recognition. Discovery for the other splitting factors could shed lighton structures significant for host specificity and on the history of influenza evolution. A large number of places in the HA, NA and M gene segments were found that can differentiate between hosts, regions, time and combination of time and region. Also very good differentiation between different Hx strains can be seen.We link one of our findings to a proposed stochastic model of creation of viral phylogenetic trees.

  • 35.
    Bartoszek, Krzysztof
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Applied Mathematics and Statistics.
    Pietro, Lio'
    Cambridge University.
    A novel algorithm to reconstruct phylogenies using gene sequences and expression data2014In: International Proceedings of Chemical, Biological & Environmental Engineering; Environment, Energy and Biotechnology III, 2014, 8-12 p.Conference paper (Refereed)
    Abstract [en]

    Phylogenies based on single loci should be viewed with caution and the best approach for obtaining robust trees is to examine numerous loci across the genome. It often happens that for the same set of species trees derived from different genes are in conflict between each other. There are several methods that combine information from different genes in order to infer the species tree. One novel approach is to use informationfrom different -omics. Here we describe a phylogenetic method based on an Ornstein–Uhlenbeck process that combines sequence and gene expression data. We test our method on genes belonging to the histidine biosynthetic operon. We found that the method provides interesting insights into selection pressures and adaptive hypotheses concerning gene expression levels.

  • 36.
    Basile, Walter
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sachenkova, Oxana
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Light, Sara
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Linköping University, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Kungliga Tekniska Högskolan, Sweden.
    High GC content causes orphan proteins to be intrinsically disordered2017In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, no 3, e1005375Article in journal (Refereed)
    Abstract [en]

    De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

  • 37.
    Basu, Sankar Chandra
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
    Wallner, Björn
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
    Finding correct protein-protein docking models using ProQDock2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 12, 262-270 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Protein-protein interactions are a key in virtually all biological processes. For a detailed understanding of the biological processes, the structure of the protein complex is essential. Given the current experimental techniques for structure determination, the vast majority of all protein complexes will never be solved by experimental techniques. In lack of experimental data, computational docking methods can be used to predict the structure of the protein complex. A common strategy is to generate many alternative docking solutions (atomic models) and then use a scoring function to select the best. The success of the computational docking technique is, to a large degree, dependent on the ability of the scoring function to accurately rank and score the many alternative docking models. Results: Here, we present ProQDock, a scoring function that predicts the absolute quality of docking model measured by a novel protein docking quality score (DockQ). ProQDock uses support vector machines trained to predict the quality of protein docking models using features that can be calculated from the docking model itself. By combining different types of features describing both the protein-protein interface and the overall physical chemistry, it was possible to improve the correlation with DockQ from 0.25 for the best individual feature (electrostatic complementarity) to 0.49 for the final version of ProQDock. ProQDock performed better than the state-of-the-art methods ZRANK and ZRANK2 in terms of correlations, ranking and finding correct models on an independent test set. Finally, we also demonstrate that it is possible to combine ProQDock with ZRANK and ZRANK2 to improve performance even further.

  • 38.
    Basu, Sankar Chandra
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
    Wallner, Björn
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
    DockQ: A Quality Measure for Protein-Protein Docking Models2016In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 11, no 8, e0161879- p.Article in journal (Refereed)
    Abstract [en]

    The state-of-the-art to assess the structural quality of docking models is currently based on three related yet independent quality measures: F-nat, LRMS, and iRMS as proposed and standardized by CAPRI. These quality measures quantify different aspects of the quality of a particular docking model and need to be viewed together to reveal the true quality, e.g. a model with relatively poor LRMS (amp;gt; 10 angstrom) might still qualify as acceptable with a descent F-nat (amp;gt; 0.50) and iRMS (amp;lt; 3.0 angstrom). This is also the reason why the so called CAPRI criteria for assessing the quality of docking models is defined by applying various ad-hoc cutoffs on these measures to classify a docking model into the four classes: Incorrect, Acceptable, Medium, or High quality. This classification has been useful in CAPRI, but since models are grouped in only four bins it is also rather limiting, making it difficult to rank models, correlate with scoring functions or use it as target function in machine learning algorithms. Here, we present DockQ, a continuous protein-protein docking model quality measure derived by combining F-nat, LRMS, and iRMS to a single score in the range [0, 1] that can be used to assess the quality of protein docking models. By using DockQ on CAPRI models it is possible to almost completely reproduce the original CAPRI classification into Incorrect, Acceptable, Medium and High quality. An average PPV of 94% at 90% Recall demonstrating that there is no need to apply predefined ad-hoc cutoffs to classify docking models. Since DockQ recapitulates the CAPRI classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI classification, making it possible to estimate model quality in a more quantitative way using Z-scores or sum of top ranked models, which has been so valuable for the CASP community. The possibility to directly correlate a quality measure to a scoring function has been crucial for the development of scoring functions for protein structure prediction, and DockQ should be useful in a similar development in the protein docking field.

  • 39.
    Bekkouche, Bo
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH.
    Classification of Neuronal Subtypes in the Striatum and the Effect of Neuronal Heterogeneity on the Activity Dynamics2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Clustering of single-cell RNA sequencing data is often used to show what states and subtypes cells have. Using this technique, striatal cells were clustered into subtypes using different clustering algorithms. Previously known subtypes were confirmed and new subtypes were found. One of them is a third medium spiny neuron subtype. Using the observed heterogeneity, as a second task, this project questions whether or not differences in individual neurons have an impact on the network dynamics. By clustering spiking activity from a neural network model, inconclusive results were found. Both algorithms indicating low heterogeneity, but by altering the quantity of a subtype between a low and high number, and clustering the network activity in each case, results indicate that there is an increase in the heterogeneity. This project shows a list of potential striatal subtypes and gives reasons to keep giving attention to biologically observed heterogeneity.

  • 40. Bem, T.
    et al.
    Cabelguen, J. M.
    Ekeberg, Örjan
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Grillner, S.
    From swimming to walking: a single basic network for two different behaviors2003In: Biological cybernetics, ISSN 0340-1200, E-ISSN 1432-0770, Vol. 88, no 2, 79-90 p.Article in journal (Refereed)
    Abstract [en]

    In this paper we consider the hypothesis that the spinal locomotor network controlling trunk movements has remained essentially unchanged during the evolutionary transition from aquatic to terrestrial locomotion. The wider repertoire of axial motor patterns expressed by amphibians would then be explained by the influence from separate limb pattern generators, added during this evolution. This study is based on EMG data recorded in vivo from epaxial musculature in the newt Pleurodeles waltl during unrestrained swimming and walking, and on a simplified model of the lamprey spinal pattern generator for swimming. Using computer simulations, we have examined the output generated by the lamprey model network for different input drives. Two distinct inputs were identified which reproduced the main features of the swimming and walking motor patterns in the newt. The swimming pattern is generated when the network receives tonic excitation with local intensity gradients near the neck and girdle regions. To produce the walking pattern, the network must receive (in addition to a tonic excitation at the girdles) a phasic drive which is out of phase in the neck and tail regions in relation to the middle part of the body. To fit the symmetry of the walking pattern, however, the intersegmental connectivity of the network had to be modified by reversing the direction of the crossed inhibitory pathways in the rostral part of the spinal cord. This study suggests that the 'input drive required for the generation of the distinct walking pattern could, at least partly, be attributed to mechanosensory feedback received by the network directly from the intraspinal stretch-receptor system. Indeed, the input drive required resembles the pattern of activity of stretch receptors sensing the lateral bending of the trunk, as expressed during walking in urodeles. Moreover, our results indicate that a nonuniform distribution of these stretch receptors along the trunk can explain the discontinuities exhibited in the swimming pattern of the newt. Thus, original network controlling axial movements not only through a direct coupling at the central level but also via a mechanical coupling between trunk and limbs, which in turn influences the sensory signals sent back to the network. Taken together, our findings support the hypothesis of a phylogenetic conservatism of the spinal locomotor networks generating axial motor patterns from agnathans to amphibians.

  • 41.
    Benjaminsson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Fransson, Peter
    Department of Clinical Neuroscience, Karolinska Institute.
    Lansner, Anders
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    A Novel Model-Free Data Analysis Technique Based on Clustering in a Mutual Information Space: Application to Resting-State fMRI2010In: Frontiers in Systems Neuroscience, ISSN 1662-5137, Vol. 4, 34:1-34:8 p.Article in journal (Refereed)
    Abstract [en]

    Non-parametric data-driven analysis techniques can be used to study datasets with few assumptions about the data and underlying experiment. Variations of independent component analysis (ICA) have been the methods mostly used on fMRI data, e.g., in finding resting-state networks thought to reflect the connectivity of the brain. Here we present a novel data analysis technique and demonstrate it on resting-state fMRI data. It is a generic method with few underlying assumptions about the data. The results are built from the statistical relations between all input voxels, resulting in a whole-brain analysis on a voxel level. It has good scalability properties and the parallel implementation is capable of handling large datasets and databases. From the mutual information between the activities of the voxels over time, a distance matrix is created for all voxels in the input space. Multidimensional scaling is used to put the voxels in a lower-dimensional space reflecting the dependency relations based on the distance matrix. By performing clustering in this space we can find the strong statistical regularities in the data, which for the resting-state data turns out to be the resting-state networks. The decomposition is performed in the last step of the algorithm and is computationally simple. This opens up for rapid analysis and visualization of the data on different spatial levels, as well as automatically finding a suitable number of decomposition components.

  • 42.
    Benjaminsson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Herman, Pawel
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lansner, Anders
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Odour discrimination and mixture segmentation in a holistic model of the mammalian olfactory systemManuscript (preprint) (Other academic)
  • 43.
    Benjaminsson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lansner, Anders
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Nexa: A scalable neural simulator with integrated analysis2012In: Network, ISSN 0954-898X, E-ISSN 1361-6536, Vol. 23, no 4, 254-271 p.Article in journal (Refereed)
    Abstract [en]

    Large-scale neural simulations encompass challenges in simulator design, data handling and understanding of simulation output. As the computational power of supercomputers and the size of network models increase, these challenges become even more pronounced. Here we introduce the experimental scalable neural simulator Nexa, for parallel simulation of large-scale neural network models at a high level of biological abstraction and for exploration of the simulation methods involved. It includes firing-rate models and capabilities to build networks using machine learning inspired methods for e. g. self-organization of network architecture and for structural plasticity. We show scalability up to the size of the largest machines currently available for a number of model scenarios. We further demonstrate simulator integration with online analysis and real-time visualization as scalable solutions for the data handling challenges.

  • 44.
    Benjaminsson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lundqvist, Mikael
    A model of categorization, learning of invariant representations and sequence prediction utilizing top-down activityManuscript (preprint) (Other academic)
  • 45.
    Bernsel, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sequence-based predictions of membrane-protein topology, homology and insertion2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Membrane proteins comprise around 20-30% of a typical proteome and play crucial roles in a wide variety of biochemical pathways. Apart from their general biological significance, membrane proteins are of particular interest to the pharmaceutical industry, being targets for more than half of all available drugs. This thesis focuses on prediction methods for membrane proteins that ultimately rely on their amino acid sequence only.

    By identifying soluble protein domains in membrane protein sequences, we were able to constrain and improve prediction of membrane protein topology, i.e. what parts of the sequence span the membrane and what parts are located on the cytoplasmic and extra-cytoplasmic sides. Using predicted topology as input to a profile-profile based alignment protocol, we managed to increase sensitivity to detect distant membrane protein homologs.

    Finally, experimental measurements of the level of membrane integration of systematically designed transmembrane helices in vitro were used to derive a scale of position-specific contributions to helix insertion efficiency for all 20 naturally occurring amino acids. Notably, position within the helix was found to be an important factor for the contribution to helix insertion efficiency for polar and charged amino acids, reflecting the highly anisotropic environment of the membrane. Using the scale to predict natural transmembrane helices in protein sequences revealed that, whereas helices in single-spanning proteins are typically hydrophobic enough to insert by themselves, a large part of the helices in multi-spanning proteins seem to require stabilizing helix-helix interactions for proper membrane integration. Implementing the scale to predict full transmembrane topologies yielded results comparable to the best statistics-based topology prediction methods.

  • 46.
    Berthet, Pierre
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Computational Modeling of the Basal Ganglia: Functional Pathways and Reinforcement Learning2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    We perceive the environment via sensor arrays and interact with it through motor outputs. The work of this thesis concerns how the brain selects actions given the information about the perceived state of the world and how it learns and adapts these selections to changes in this environment. Reinforcement learning theories suggest that an action will be more or less likely to be selected if the outcome has been better or worse than expected. A group of subcortical structures, the basal ganglia (BG), is critically involved in both the selection and the reward prediction.

    We developed and investigated a computational model of the BG. We implemented a Bayesian-Hebbian learning rule, which computes the weights between two units based on the probability of their activations. We were able test how various configurations of the represented pathways impacted the performance in several reinforcement learning and conditioning tasks. Then, following the development of a more biologically plausible version with spiking neurons, we simulated lesions in the different pathways and assessed how they affected learning and selection.

    We observed that the evolution of the weights and the performance of the models resembled qualitatively experimental data. The absence of an unique best way to configure the model over all the learning paradigms tested indicates that an agent could dynamically configure its action selection mode, mainly by including or not the reward prediction values in the selection process. We present hypotheses on possible biological substrates for the reward prediction pathway. We base these on the functional requirements for successful learning and on an analysis of the experimental data. We further simulate a loss of dopaminergic neurons similar to that reported in Parkinson’s disease. We suggest that the associated motor symptoms are mostly causedby an impairment of the pathway promoting actions, while the pathway suppressing them seems to remain functional.

  • 47.
    Berthet, Pierre
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Hällgren Kotaleski, Jeanette
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lansner, Anders
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Action selection performance of a reconfigurable Basal Ganglia inspired model with Hebbian-Bayesian Go-NoGo connectivity2012In: Frontiers in Behavioral Neuroscience, ISSN 1662-5153, Vol. 6, 65- p.Article in journal (Refereed)
    Abstract [en]

    Several studies have shown a strong involvement of the basal ganglia (BG) in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE), i.e. the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behaviour to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian-Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go) and the indirect (NoGo) pathway, as well as the reward prediction (RP) system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo and RP system were utilized, e.g. using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioural data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available.

  • 48.
    Berthet, Pierre
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lansner, Anders
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Optogenetic Stimulation in a Computational Model of the Basal Ganglia Biases Action Selection and Reward Prediction Error2014In: PLoS ONE, ISSN 1932-6203, Vol. 9, no 3, e90578- p.Article in journal (Refereed)
    Abstract [en]

    Optogenetic stimulation of specific types of medium spiny neurons (MSNs) in the striatum has been shown to bias the selection of mice in a two choices task. This shift is dependent on the localisation and on the intensity of the stimulation but also on the recent reward history. We have implemented a way to simulate this increased activity produced by the optical flash in our computational model of the basal ganglia (BG). This abstract model features the direct and indirect pathways commonly described in biology, and a reward prediction pathway (RP). The framework is similar to Actor-Critic methods and to the ventral/ dorsal distinction in the striatum. We thus investigated the impact on the selection caused by an added stimulation in each of the three pathways. We were able to reproduce in our model the bias in action selection observed in mice. Our results also showed that biasing the reward prediction is sufficient to create a modification in the action selection. However, we had to increase the percentage of trials with stimulation relative to that in experiments in order to impact the selection. We found that increasing only the reward prediction had a different effect if the stimulation in RP was action dependent (only for a specific action) or not. We further looked at the evolution of the change in the weights depending on the stage of learning within a block. A bias in RP impacts the plasticity differently depending on that stage but also on the outcome. It remains to experimentally test how the dopaminergic neurons are affected by specific stimulations of neurons in the striatum and to relate data to predictions of our model.

  • 49.
    Berthet, Pierre
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Lindahl, Mikael
    Tully, Philip
    Hellgren-Kotaleski, Jeanette
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Functional relevance of different basal ganglia pathways investigated in a spiking model with reward dependent plasticityManuscript (preprint) (Other academic)
  • 50.
    Bhattacharyya, Dhananjay
    et al.
    Saha Institute Nucl Phys, India.
    Halder, Sukanya
    Saha Institute Nucl Phys, India.
    Basu, Sankar Chandra
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering. University of Calcutta, India.
    Mukherjee, Debasish
    Saha Institute Nucl Phys, India.
    Kumar, Prasun
    Indian Institute Science, India.
    Bansal, Manju
    Indian Institute Science, India.
    RNAHelix: computational modeling of nucleic acid structures with Watson-Crick and non-canonical base pairs2017In: Journal of Computer-Aided Molecular Design, ISSN 0920-654X, E-ISSN 1573-4951, Vol. 31, no 2, 219-235 p.Article in journal (Refereed)
    Abstract [en]

    Comprehensive analyses of structural features of non-canonical base pairs within a nucleic acid double helix are limited by the availability of a small number of three dimensional structures. Therefore, a procedure for model building of double helices containing any given nucleotide sequence and base pairing information, either canonical or non-canonical, is seriously needed. Here we describe a program RNAHelix, which is an updated version of our widely used software, NUCGEN. The program can regenerate duplexes using the dinucleotide step and base pair orientation parameters for a given double helical DNA or RNA sequence with defined Watson-Crick or non-Watson-Crick base pairs. The original structure and the corresponding regenerated structure of double helices were found to be very close, as indicated by the small RMSD values between positions of the corresponding atoms. Structures of several usual and unusual double helices have been regenerated and compared with their original structures in terms of base pair RMSD, torsion angles and electrostatic potentials and very high agreements have been noted. RNAHelix can also be used to generate a structure with a sequence completely different from an experimentally determined one or to introduce single to multiple mutation, but with the same set of parameters and hence can also be an important tool in homology modeling and study of mutation induced structural changes.

1234567 1 - 50 of 590
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf