Endre søk
Begrens søket
1234567 1 - 50 of 619
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Abbaszadeh Shahri, Abbas
    KTH, Skolan för arkitektur och samhällsbyggnad (ABE), Byggvetenskap. Islamic Azad University.
    An Optimized Artificial Neural Network Structure to Predict Clay Sensitivity in a High Landslide Prone Area Using Piezocone Penetration Test (CPTu) Data: A Case Study in Southwest of Sweden2016Inngår i: Geotechnical and Geological Engineering, ISSN 0960-3182, E-ISSN 1573-1529, 1-14 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Application of artificial neural networks (ANN) in various aspects of geotechnical engineering problems such as site characterization due to have difficulty to solve or interrupt through conventional approaches has demonstrated some degree of success. In the current paper a developed and optimized five layer feed-forward back-propagation neural network with 4-4-4-3-1 topology, network error of 0.00201 and R2 = 0.941 under the conjugate gradient descent ANN training algorithm was introduce to predict the clay sensitivity parameter in a specified area in southwest of Sweden. The close relation of this parameter to occurred landslides in Sweden was the main reason why this study is focused on. For this purpose, the information of 70 piezocone penetration test (CPTu) points was used to model the variations of clay sensitivity and the influences of direct or indirect related parameters to CPTu has been taken into account and discussed in detail. Applied operation process to find the optimized ANN model using various training algorithms as well as different activation functions was the main advantage of this paper. The performance and feasibility of proposed optimized model has been examined and evaluated using various statistical and analytical criteria as well as regression analyses and then compared to in situ field tests and laboratory investigation results. The sensitivity analysis of this study showed that the depth and pore pressure are the two most and cone tip resistance is the least effective factor on prediction of clay sensitivity.

  • 2. Aberer, André
    et al.
    Stamatakis, Alexis
    Ronquist, Fredrik
    Naturhistoriska riksmuseet, Enheten för bioinformatik och genetik.
    An efficient independence sampler for updating branches in Bayesian Markov chain Monte Carlo sampling of phylogenetic trees2016Inngår i: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 65, nr 1, 161-176 s.Artikkel i tidsskrift (Fagfellevurdert)
  • 3.
    Abraham, Mark James
    et al.
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Murtola, T.
    Schulz, R.
    Páll, Szilárd
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Smith, J. C.
    Hess, Berk
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lindahl, Erik
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers2015Inngår i: SoftwareX, ISSN 2352-7110, Vol. 1-2, 19-25 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  • 4.
    Aftab, Obaid
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Fryknäs, Mårten
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Hammerling, Ulf
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Larsson, Rolf
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Gustafsson, Mats
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Detection of cell aggregation and altered cell viability by automated label-free video microscopy: A promising alternative to endpoint viability assays in high throughput screening2015Inngår i: Journal of Biomolecular Screening, ISSN 1087-0571, E-ISSN 1552-454X, Vol. 20, nr 3, 372-381 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Automated phase-contrast video microscopy now makes it feasible to monitor a high-throughput (HT) screening experiment in a 384-well microtiter plate format by collecting one time-lapse video per well. Being a very cost-effective and label-free monitoring method, its potential as an alternative to cell viability assays was evaluated. Three simple morphology feature extraction and comparison algorithms were developed and implemented for analysis of differentially time-evolving morphologies (DTEMs) monitored in phase-contrast microscopy videos. The most promising layout, pixel histogram hierarchy comparison (PHHC), was able to detect several compounds that did not induce any significant change in cell viability, but made the cell population appear as spheroidal cell aggregates. According to recent reports, all these compounds seem to be involved in inhibition of platelet-derived growth factor receptor (PDGFR) signaling. Thus, automated quantification of DTEM (AQDTEM) holds strong promise as an alternative or complement to viability assays in HT in vitro screening of chemical compounds.

  • 5.
    Agarwal, Prasoon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Hematologi och immunologi.
    Regulation of Gene Expression in Multiple Myeloma Cells and Normal Fibroblasts: Integrative Bioinformatic and Experimental Approaches2014Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    The work presented in this thesis applies integrative genomic and experimental approaches to investigate mechanisms involved in regulation of gene expression in the context of disease and normal cell biology.

    In papers I and II, we have explored the role of epigenetic regulation of gene expression in multiple myeloma (MM). By using a bioinformatic approach we identified the Polycomb repressive complex 2 (PRC2) to be a common denominator for the underexpressed gene signature in MM. By using inhibitors of the PRC2 we showed an activation of the genes silenced by H3K27me3 and a reduction in the tumor load and increased overall survival in the in vivo 5TMM model. Using ChIP-sequencing we defined the distribution of H3K27me3 and H3K4me3 marks in MM patients cells. In an integrated bioinformatic approach, the H3K27me3-associated genes significantly correlated to under-expression in patients with less favorable survival. Thus, our data indicates the presence of a common under-expressed gene profile and provides a rationale for implementing new therapies focusing on epigenetic alterations in MM.

    In paper III we address the existence of a small cell population in MM presenting with differential tumorigenic properties in the 5T33MM murine model. We report that the predominant population of CD138+ cells had higher engraftment potential, higher clonogenic growth, whereas the CD138- MM cells presented with less mature phenotype and higher drug resistance. Our findings suggest that while designing treatment regimes for MM, both the cellpopulations must be targeted.

    In paper IV we have studied the general mechanism of differential gene expression regulation by CGGBP1 in response to growth signals in normal human fibroblasts. We found that CGGBP1 binding affects global gene expression by RNA Polymerase II. This is mediated by Alu RNAdependentinhibition of RNA Polymerase II. In presence of growth signals CGGBP1 is retained in the nuclei and exhibits enhanced Alu binding thus inhibiting RNA Polymerase III binding on Alus. Hence we suggest a mechanism by which CGGBP1 orchestrates Alu RNA-mediated regulation of RNA Polymerase II. This thesis provides new insights for using integrative bioinformatic approaches to decipher gene expression regulation mechanisms in MM and in normal cells.

  • 6. Aidas, Kestutis
    et al.
    Angeli, Celestino
    Bak, Keld L.
    Bakken, Vebjorn
    Bast, Radovan
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi.
    Boman, Linus
    Christiansen, Ove
    Cimiraglia, Renzo
    Coriani, Sonia
    Dahle, Pal
    Dalskov, Erik K.
    Ekstrom, Ulf
    Enevoldsen, Thomas
    Eriksen, Janus J.
    Ettenhuber, Patrick
    Fernandez, Berta
    Ferrighi, Lara
    Fliegl, Heike
    Frediani, Luca
    Hald, Kasper
    Halkier, Asger
    Hattig, Christof
    Heiberg, Hanne
    Helgaker, Trygve
    Hennum, Alf Christian
    Hettema, Hinne
    Hjertenaes, Eirik
    Host, Stinne
    Hoyvik, Ida-Marie
    Iozzi, Maria Francesca
    Jansik, Branislav
    Jensen, Hans Jorgen Aa.
    Jonsson, Dan
    Jorgensen, Poul
    Kauczor, Joanna
    Kirpekar, Sheela
    Kjrgaard, Thomas
    Klopper, Wim
    Knecht, Stefan
    Kobayashi, Rika
    Koch, Henrik
    Kongsted, Jacob
    Krapp, Andreas
    Kristensen, Kasper
    Ligabue, Andrea
    Lutnaes, Ola B.
    Melo, Juan I.
    Mikkelsen, Kurt V.
    Myhre, Rolf H.
    Neiss, Christian
    Nielsen, Christian B.
    Norman, Patrick
    Olsen, Jeppe
    Olsen, Jogvan Magnus H.
    Osted, Anders
    Packer, Martin J.
    Pawlowski, Filip
    Pedersen, Thomas B.
    Provasi, Patricio F.
    Reine, Simen
    Rinkevicius, Zilvinas
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Ruden, Torgeir A.
    Ruud, Kenneth
    Rybkin, Vladimir V.
    Salek, Pawel
    Samson, Claire C. M.
    de Meras, Alfredo Sanchez
    Saue, Trond
    Sauer, Stephan P. A.
    Schimmelpfennig, Bernd
    Sneskov, Kristian
    Steindal, Arnfinn H.
    Sylvester-Hvid, Kristian O.
    Taylor, Peter R.
    Teale, Andrew M.
    Tellgren, Erik I.
    Tew, David P.
    Thorvaldsen, Andreas J.
    Thogersen, Lea
    Vahtras, Olav
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi.
    Watson, Mark A.
    Wilson, David J. D.
    Ziolkowski, Marcin
    Ågren, Hans
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi.
    The Dalton quantum chemistry program system2014Inngår i: Wiley Interdisciplinary Reviews. Computational Molecular Science, ISSN 1759-0876, Vol. 4, nr 3, 269-284 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Dalton is a powerful general-purpose program system for the study of molecular electronic structure at the Hartree-Fock, Kohn-Sham, multiconfigurational self-consistent-field, MOller-Plesset, configuration-interaction, and coupled-cluster levels of theory. Apart from the total energy, a wide variety of molecular properties may be calculated using these electronic-structure models. Molecular gradients and Hessians are available for geometry optimizations, molecular dynamics, and vibrational studies, whereas magnetic resonance and optical activity can be studied in a gauge-origin-invariant manner. Frequency-dependent molecular properties can be calculated using linear, quadratic, and cubic response theory. A large number of singlet and triplet perturbation operators are available for the study of one-, two-, and three-photon processes. Environmental effects may be included using various dielectric-medium and quantum-mechanics/molecular-mechanics models. Large molecules may be studied using linear-scaling and massively parallel algorithms. Dalton is distributed at no cost from for a number of UNIX platforms.

  • 7.
    Ajawatanawong, Pravech
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för organismbiologi, Systematisk biologi.
    Atkinson, Gemma C.
    Watson-Haigh, Nathan S.
    MacKenzie, Bryony
    Baldauf, Sandra L.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för organismbiologi, Systematisk biologi.
    SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments2012Inngår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, nr W1, W340-W347 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  • 8. Alger, Ingela
    et al.
    Weibull, Jörgen W.
    KTH, Skolan för teknikvetenskap (SCI), Matematik (Inst.).
    A generalization of Hamilton's rule-Love others how much?2012Inngår i: Journal of Theoretical Biology, ISSN 0022-5193, E-ISSN 1095-8541, Vol. 299, 42-54 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    According to Hamilton's (1964a, b) rule, a costly action will be undertaken if its fitness cost to the actor falls short of the discounted benefit to the recipient, where the discount factor is Wright's index of relatedness between the two. We propose a generalization of this rule, and show that if evolution operates at the level of behavior rules, rather than directly at the level of actions, evolution will select behavior rules that induce a degree of cooperation that may differ from that predicted by Hamilton's rule as applied to actions. In social dilemmas there will be less (more) cooperation than under Hamilton's rule if the actions are strategic substitutes (complements). Our approach is based on natural selection, defined in terms of personal (direct) fitness, and applies to a wide range of pairwise interactions.

  • 9.
    Ali, Raja Hashim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    From genomes to post-processing of Bayesian inference of phylogeny2016Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    Life is extremely complex and amazingly diverse; it has taken billions of years of evolution to attain the level of complexity we observe in nature now and ranges from single-celled prokaryotes to multi-cellular human beings. With availability of molecular sequence data, algorithms inferring homology and gene families have emerged and similarity in gene content between two genes has been the major signal utilized for homology inference. Recently there has been a significant rise in number of species with fully sequenced genome, which provides an opportunity to investigate and infer homologs with greater accuracy and in a more informed way. Phylogeny analysis explains the relationship between member genes of a gene family in a simple, graphical and plausible way using a tree representation. Bayesian phylogenetic inference is a probabilistic method used to infer gene phylogenies and posteriors of other evolutionary parameters. Markov chain Monte Carlo (MCMC) algorithm, in particular using Metropolis-Hastings sampling scheme, is the most commonly employed algorithm to determine evolutionary history of genes. There are many softwares available that process results from each MCMC run, and explore the parameter posterior but there is a need for interactive software that can analyse both discrete and real-valued parameters, and which has convergence assessment and burnin estimation diagnostics specifically designed for Bayesian phylogenetic inference.

    In this thesis, a synteny-aware approach for gene homology inference, called GenFamClust (GFC), is proposed that uses gene content and gene order conservation to infer homology. The feature which distinguishes GFC from earlier homology inference methods is that local synteny has been combined with gene similarity to infer homologs, without inferring homologous regions. GFC was validated for accuracy on a simulated dataset. Gene families were computed by applying clustering algorithms on homologs inferred from GFC, and compared for accuracy, dependence and similarity with gene families inferred from other popular gene family inference methods on a eukaryotic dataset. Gene families in fungi obtained from GFC were evaluated against pillars from Yeast Gene Order Browser. Genome-wide gene families for some eukaryotic species are computed using this approach.

    Another topic focused in this thesis is the processing of MCMC traces for Bayesian phylogenetics inference. We introduce a new software VMCMC which simplifies post-processing of MCMC traces. VMCMC can be used both as a GUI-based application and as a convenient command-line tool. VMCMC supports interactive exploration, is suitable for automated pipelines and can handle both real-valued and discrete parameters observed in a MCMC trace. We propose and implement joint burnin estimators that are specifically applicable to Bayesian phylogenetics inference. These methods have been compared for similarity with some other popular convergence diagnostics. We show that Bayesian phylogenetic inference and VMCMC can be applied to infer valuable evolutionary information for a biological case – the evolutionary history of FERM domain.

  • 10.
    Ali, Raja Hashim
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Bark, Mikael
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Miró, Jorge
    KTH, Skolan för informations- och kommunikationsteknik (ICT).
    Muhammad, Sayyed Auwn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Sjöstrand, J.
    Zubair, Syed M.
    KTH, Skolan för elektro- och systemteknik (EES), Kommunikationsnät. University of Balochistan, Pakistan.
    Abbas, R. M.
    Arvestad, L.
    VMCMC: A graphical and statistical analysis tool for Markov chain Monte Carlo traces2017Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, nr 1, 97Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters. Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines. Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.

  • 11.
    Ali, Raja Hashim
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Muhammad, Sayyed Auwn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Khan, Mehmodd Alam
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    Stockholms universitet.
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, S12- s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 12. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013Inngår i: BMC Bioinformatics, ISSN 1471-2105, Vol. 14, nr Suppl,15, S12- s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 13.
    Al-Jaff, Mohammed
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi.
    Sandström, Eric
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi.
    Grabherr, Manfred
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala Univ, Bioinformat Infrastruct Life Sci, S-75123 Uppsala, Sweden..
    microTaboo: a general and practical solution to the k-disjoint problem2017Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, 228Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable.

    Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe-to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining.

    Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment-and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at:https://MohammedAlJaff.github.io/microTaboo

  • 14.
    Alneberg, Johannes
    et al.
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
    Bjarnason, Brynjar Smári
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
    de Bruijn, Ino
    Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm, Sweden.
    Schirmer, Melanie
    School of Engineering, University of Glasgow, Glasgow, UK.
    Quick, Joshua
    Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK; National Institute for Health Research Surgical Reconstruction (NIHR) Surgical Reconstruction and Microbiology Research Centre, University of Birmingham, UK.
    Ijaz, Umer Z.
    School of Engineering, University of Glasgow, Glasgow, UK.
    Lahti, Leo
    Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland; Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.
    Loman, Nicholas J
    Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK.
    Andersson, Anders F
    KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnoloy, Division of Gene Technology, Stockholm, Sweden.
    Quince, Christopher
    School of Engineering, University of Glasgow, Glasgow, UK.
    Binning metagenomic contigs by coverage and composition2014Inngår i: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 11, nr 11, 1144-6 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.

  • 15.
    Alvarsson, Jonathan
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Engkvist, Ola
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Carlsson, Lars
    Wikberg, Jarl E. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Noeske, Tobias
    Ligand-Based Target Prediction with Signature Fingerprints2014Inngår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, nr 10, 2647-2653 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

  • 16.
    Alvarsson, Jonathan
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lampa, Samuel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Schaal, Wesley
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Andersson, Claes
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Wikberg, Jarl E. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Large-scale ligand-based predictive modelling using support vector machines2016Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, 39Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

  • 17.
    Ameur, Adam
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    A Bioinformatics Study of Human Transcriptional Regulation2008Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    Regulation of transcription is a central mechanism in all living cells that now can be investigated with high-throughput technologies. Data produced from such experiments give new insights to how transcription factors (TFs) coordinate the gene transcription and thereby regulate the amounts of proteins produced. These studies are also important from a medical perspective since TF proteins are often involved in disease. To learn more about transcriptional regulation, we have developed strategies for analysis of data from microarray and massively parallel sequencing (MPS) experiments.

    Our computational results consist of methods to handle the steadily increasing amount of data from high-throughput technologies. Microarray data analysis tools have been assembled in the LCB-Data Warehouse (LCB-DWH) (paper I), and other analysis strategies have been developed for MPS data (paper V). We have also developed a de novo motif search algorithm called BCRANK (paper IV).

    The analysis has lead to interesting biological findings in human liver cells (papers II-V). The investigated TFs appeared to bind at several thousand sites in the genome, that we have identified at base pair resolution. The investigated histone modifications are mainly found downstream of transcription start sites, and correlated to transcriptional activity. These histone marks are frequently found for pairs of genes in a bidirectional conformation. Our results suggest that a TF can bind in the shared promoter of two genes and regulate both of them.

    From a medical perspective, the genes bound by the investigated TFs are candidates to be involved in metabolic disorders. Moreover, we have developed a new strategy to detect single nucleotide polymorphisms (SNPs) that disrupt the binding of a TF (paper IV). We further demonstrated that SNPs can affect transcription in the immediate vicinity. Ultimately, our method may prove helpful to find disease-causing regulatory SNPs.

  • 18.
    Amrein, Beat Anton
    et al.
    Uppsala universitet, Science for Life Laboratory, SciLifeLab. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Struktur- och molekylärbiologi.
    Steffen-Munsberg, Fabian
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Struktur- och molekylärbiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Szeler, Ireneusz
    Uppsala universitet, Science for Life Laboratory, SciLifeLab. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Struktur- och molekylärbiologi.
    Purg, Miha
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Struktur- och molekylärbiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Kulkarni, Yashraj
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Struktur- och molekylärbiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Kamerlin, Shina Caroline Lynn
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Struktur- och molekylärbiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    CADEE: Computer-Aided Directed Evolution of Enzymes2017Inngår i: IUCrJ, ISSN 0972-6918, E-ISSN 2052-2525, Vol. 4, nr 1, 50-64 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The tremendous interest in enzymes as biocatalysts has led to extensive work in enzyme engineering, as well as associated methodology development. Here, a new framework for computer-aided directed evolution of enzymes (CADEE) is presented which allows a drastic reduction in the time necessary to prepare and analyze in silico semi-automated directed evolution of enzymes. A pedagogical example of the application of CADEE to a real biological system is also presented in order to illustrate the CADEE workflow.

  • 19.
    Anders, Patrizia
    Högskolan i Skövde, Institutionen för kommunikation och information.
    A bioinformaticians view on the evolution of smell perception2006Independent thesis Advanced level (degree of Master (One Year)), 20 poäng / 30 hpOppgave
    Abstract [en]

    Background:

    The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates.

    Results:

    Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a

    high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods.

    Conclusions:

    The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction.

  • 20.
    Andersson, Malin
    Högskolan i Skövde, Institutionen för datavetenskap.
    A method for identification of putatively co-regulated genes2002Independent thesis Advanced level (degree of Master (One Year))Oppgave
    Abstract [en]

    The genomes of several organisms have been sequenced and the need for methods to analyse the data is growing. In this project a method is described that tries to identify co-regulated genes. The method identifies transcription factor binding sites, documented in TRANSFAC, in the non-coding regions of genes. The algorithm counts the number of common binding sites and the number of unique binding sites for each pair of genes and decides if the genes are co-regulated. The result of the method is compared with the correlation between the gene expression patterns of the genes. The method is tested on 21 gene pairs from the genome of Saccharomyces cerevisiae. The algorithm first identified binding sites from all organisms. The accuracy of the program was very low in this case. When the algorithm was modified to only identify binding sites found in plants the accuracy was much improved, from 52% to 76% correct predictions.

  • 21.
    Andersson, Samuel A.
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Motif Yggdrasil: Sampling from a tree mixture model2006Inngår i: Research In Computational Molecular Biology, Proceedings / [ed] Apostolico, A; Guerra, C; Istrail, S; Pevzner, P; Waterman, M, 2006, Vol. 3909, 458-472 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.

  • 22.
    Andrade, Jorge
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Grid and High-Performance Computing for Applied Bioinformatics2007Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    The beginning of the twenty-first century has been characterized by an explosion of biological information. The avalanche of data grows daily and arises as a consequence of advances in the fields of molecular biology and genomics and proteomics. The challenge for nowadays biologist lies in the de-codification of this huge and complex data, in order to achieve a better understanding of how our genes shape who we are, how our genome evolved, and how we function.

    Without the annotation and data mining, the information provided by for example high throughput genomic sequencing projects is not very useful. Bioinformatics is the application of computer science and technology to the management and analysis of biological data, in an effort to address biological questions. The work presented in this thesis has focused on the use of Grid and High Performance Computing for solving computationally expensive bioinformatics tasks, where, due to the very large amount of available data and the complexity of the tasks, new solutions are required for efficient data analysis and interpretation.

    Three major research topics are addressed; First, the use of grids for distributing the execution of sequence based proteomic analysis, its application in optimal epitope selection and in a proteome-wide effort to map the linear epitopes in the human proteome. Second, the application of grid technology in genetic association studies, which enabled the analysis of thousand of simulated genotypes, and finally the development and application of a economic based model for grid-job scheduling and resource administration.

    The applications of the grid based technology developed in the present investigation, results in successfully tagging and linking chromosomes regions in Alzheimer disease, proteome-wide mapping of the linear epitopes, and the development of a Market-Based Resource Allocation in Grid for Scientific Applications.

  • 23.
    Andrade, Jorge
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Andersen, Malin
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Berglund, Lisa
    KTH, Skolan för bioteknologi (BIO), Proteomik.
    Odeberg, Jacob
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Applications of grid computing in genetics and proteomics2007Inngår i: Applied Parallel Computing: State Of The Art In Scientific Computing / [ed] Kagstrom, B; Elmroth, E; Dongarra, J; Wasniewski, J, 2007, Vol. 4699, 791-798 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The potential for Grid technologies in applied bioinformatics is largely unexplored. We have developed a model for solving computationally demanding bioinformatics tasks in distributed Grid environments, designed to ease the usability for scientists unfamiliar with Grid computing. With a script-based implementation that uses a strategy of temporary installations of databases and existing executables on remote nodes at submission, we propose a generic solution that do not rely on predefined Grid runtime environments and that can easily be adapted to other bioinformatics tasks suitable for parallelization. This implementation has been successfully applied to whole proteome sequence similarity analyses and to genome-wide genotype simulations, where computation time was reduced from years to weeks. We conclude that computational Grid technology is a useful resource for solving high compute tasks in genetics and proteomics using existing algorithms.

  • 24. Ansotegui, Carlos
    et al.
    Luisa Bonet, Maria
    Giraldez-Cru, Jesus
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS. Spanish National Research Council, Spain.
    Levy, Jordi
    Structure features for SAT instances classification2017Inngår i: Journal of Applied Logic, ISSN 1570-8683, E-ISSN 1570-8691, Vol. 23, 27-39 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The success of portfolio approaches in SAT solving relies on the observation that different SAT solvers may dramatically change their performance depending on the class of SAT instances they are trying to solve. In these approaches, a set of features of the problem is used to build a prediction model, which classifies instances into classes, and computes the fastest algorithm to solve each of them. Therefore, the set of features used to build these classifiers plays a crucial role. Traditionally, portfolio SAT solvers include features about the structure of the problem and its hardness. Recently, there have been some attempts to better characterize the structure of industrial SAT instances. In this paper, we use some structure features of industrial SAT instances to build some classifiers of industrial SAT families of instances. Namely, they are the scale-free structure, the community structure and the self similar structure. First, we measure the effectiveness of these classifiers by comparing them to other sets of SAT features commonly used in portfolio SAT solving approaches. Then, we evaluate the performance of this set of structure features when used in a real portfolio SAT solver. Finally, we analyze the relevance of these features on the analyzed classifiers.

  • 25.
    Arvestad, Lars
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Aligning coding DNA in the presence of frame-shift errors1997Inngår i: Combinatorial Pattern Matching, 1997, 180-190 s.Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The problem of aligning two DNA sequences with respect to the fact that they are coding for proteins is discussed. Criteria for a good alignment of coding DNA, together with an algorithm that satisfies them, are presented. The algorithm is robust against frame-shifts and forgiving towards silent substitutions. The important choice of objective function is examined and several variants are proposed.

  • 26.
    Arvestad, Lars
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Berglund, Ann-Charlotte
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lagergren, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Sennblad, Bengt
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Bayesian gene/species tree reconciliation and orthology analysis using MCMC2003Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 19, i7-i15 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available.

    Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

  • 27.
    Arvestad, Lars
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Bruno, William
    Los Alamos National Laboratory.
    Estimation of Reversible Substitution Matrices from Multiple Pairs of Sequences1997Inngår i: Journal of Molecular Evolution, ISSN 0022-2844, E-ISSN 1432-1432, Vol. 45, nr 6, 696-703 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.

  • 28.
    Arvidsson, Staffan
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Carlsson, Lars
    AstraZeneca R&D.
    Paulo, Toccaceli
    Royal Holloway University of London.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors2017Inngår i: Conformal and Probabilistic Prediction with Applications (COPA) 2017 / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Harris Papadopoulos, 2017, Vol. 60, 118-131 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Prediction of drug metabolism is an important topic in the drug discovery process, and we here present a study using probabilistic predictions applying Cross Venn-ABERS Predictors (CVAPs) on data for site-of-metabolism. We used a dataset of 73599 biotransformations, applied SMIRKS to define biotransformations of interest and constructed five datasets where chemical structures were represented using signatures descriptors. The results show that CVAP produces well-calibrated predictions for all datasets with good predictive capability, making CVAP an interesting method for further exploration in drug discovery applications.

  • 29.
    Auffarth, Benjamin
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Kaplan, Bernhard
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Anders, Lansner
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Map formation in the olfactory bulb by axon guidance of olfactory neurons2011Inngår i: Frontiers in Systems Neuroscience, ISSN 1662-5137, Vol. 5, nr 0Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The organization of representations in the brain has been observed to locally reflect subspaces of inputs that are relevant to behavioral or perceptual feature combinations, such as in areas receptive to lower and higher-order features in the visual system. The early olfactory system developed highly plastic mechanisms and convergent evidence indicates that projections from primary neurons converge onto the glomerular level of the olfactory bulb (OB) to form a code composed of continuous spatial zones that are differentially active for particular physico?-chemical feature combinations, some of which are known to trigger behavioral responses. In a model study of the early human olfactory system, we derive a glomerular organization based on a set of real-world,biologically-relevant stimuli, a distribution of receptors that respond each to a set of odorants of similar ranges of molecular properties, and a mechanism of axon guidance based on activity. Apart from demonstrating activity-dependent glomeruli formation and reproducing the relationship of glomerular recruitment with concentration, it is shown that glomerular responses reflect similarities of human odor category perceptions and that further, a spatial code provides a better correlation than a distributed population code. These results are consistent with evidence of functional compartmentalization in the OB and could suggest a function for the bulb in encoding of perceptual dimensions.

  • 30.
    Aurell, Erik
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). Aalto University, Finland.
    Innocenti, Nicolas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). The Hebrew University of Jerusalem, Israel.
    Zhou, Hai-Jun
    State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.
    The bulk and the tail of minimal absent words in genome sequences2016Inngår i: Physical Biology, ISSN 1478-3967, E-ISSN 1478-3975, Vol. 13, nr 2, 026004Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Minimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether the reason behind this phenomenon is statistical or reflects a biological mechanism, and what biological information is contained in absent words. % In this work we demonstrate that the bulk can be described by a probabilistic model of sampling words from random sequences, while the tail of long MAWs is of biological origin. We introduce the novel concept of a core of a minimal absent word, which are sequences present in the genome and closest to a given MAW. We show that in bacteria and yeast the cores of the longest MAWs, which exist in two or more copies, are located in highly conserved regions the most prominent example being ribosomal RNAs (rRNAs). We also show that while the distribution of the cores of long MAWs is roughly uniform over these genomes on a coarse-grained level, on a more detailed level it is strongly enhanced in 3' untranslated regions (UTRs) and, to a lesser extent, also in 5' UTRs. This indicates that MAWs and associated MAW cores correspond to fine-tuned evolutionary relationships, and suggest that they can be more widely used as markers for genomic complexity.

  • 31.
    Austin, Peter C.
    et al.
    Inst Clin Evaluat Sci, G106,2075 Bayview Ave, Toronto, ON M4N 3M5, Canada.;Univ Toronto, Inst Hlth Management Policy & Evaluat, Toronto, ON, Canada.;Sunnybrook Res Inst, Schulich Heart Res Program, Toronto, ON, Canada..
    Wagner, Philippe
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska och farmaceutiska vetenskapsområdet, centrumbildningar mm, Centrum för klinisk forskning, Västerås. Lund Univ, Unit Social Epidemiol, Fac Med, Malmo, Sweden..
    Merlo, Juan
    Lund Univ, Unit Social Epidemiol, Fac Med, Malmo, Sweden.;Region Skane, Ctr Primary Hlth Care Res, Malmo, Sweden..
    The median hazard ratio: a useful measure of variance and general contextual effects in multilevel survival analysis2017Inngår i: Statistics in Medicine, ISSN 0277-6715, E-ISSN 1097-0258, Vol. 36, nr 6, 928-938 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis.

  • 32.
    Ballber Torres, Nuria
    et al.
    University of Politecn Cataluna, Spain.
    Altafini, Claudio
    Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska fakulteten.
    Drug combinatorics and side effect estimation on the signed human drug-target network2016Inngår i: BMC Systems Biology, ISSN 1752-0509, E-ISSN 1752-0509, Vol. 10, nr 74Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: The mode of action of a drug on its targets can often be classified as being positive (activator, potentiator, agonist, etc.) or negative (inhibitor, blocker, antagonist, etc.). The signed edges of a drug-target network can be used to investigate the combined mechanisms of action of multiple drugs on the ensemble of common targets. Results: In this paper it is shown that for the signed human drug-target network the majority of drug pairs tend to have synergistic effects on the common targets, i.e., drug pairs tend to have modes of action with the same sign on most of the shared targets, especially for the principal pharmacological targets of a drug. Methods are proposed to compute this synergism, as well as to estimate the influence of the drugs on the side effect of another drug. Conclusions: Enriching a drug-target network with information of functional nature like the sign of the interactions allows to explore in a systematic way a series of network properties of key importance in the context of computational drug combinatorics.

  • 33.
    Baltzer, Nicholas
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräkningsbiologi och bioinformatik. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Stockholm County, Sweden.
    Sundström, Karin
    Karolinska Inst, Dept Lab Med, Stockholm, Stockholm Count, Sweden..
    Nygård, Jan F.
    Canc Registry Norway, Dept Registry Informat, Oslo, Oslo County, Norway..
    Dillner, Joakim
    Karolinska Inst, Dept Lab Med, Stockholm, Stockholm Count, Sweden..
    Komorowski, Jan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräkningsbiologi och bioinformatik. Polish Acad Sci, Inst Comp Sci, Warsaw, Warsaw County, Poland..
    Risk stratification in cervical cancer screening by complete screening history: Applying bioinformatics to a general screening population2017Inngår i: International Journal of Cancer, ISSN 0020-7136, E-ISSN 1097-0215, Vol. 141, nr 1, 200-209 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Women screened for cervical cancer in Sweden are currently treated under a one-size-fits-all programme, which has been successful in reducing the incidence of cervical cancer but does not use all of the participants' available medical information. This study aimed to use women's complete cervical screening histories to identify diagnostic patterns that may indicate an increased risk of developing cervical cancer. A nationwide case-control study was performed where cervical cancer screening data from 125,476 women with a maximum follow-up of 10 years were evaluated for patterns of SNOMED diagnoses. The cancer development risk was estimated for a number of different screening history patterns and expressed as Odds Ratios (OR), with a history of 4 benign cervical tests as reference, using logistic regression. The overall performance of the model was moderate (64% accuracy, 71% area under curve) with 61-62% of the study population showing no specific patterns associated with risk. However, predictions for high-risk groups as defined by screening history patterns were highly discriminatory with ORs ranging from 8 to 36. The model for computing risk performed consistently across different screening history lengths, and several patterns predicted cancer outcomes. The results show the presence of risk-increasing and risk-decreasing factors in the screening history. Thus it is feasible to identify subgroups based on their complete screening histories. Several high-risk subgroups identified might benefit from an increased screening density. Some low-risk subgroups identified could likely have a moderately reduced screening density without additional risk.

  • 34.
    Bartoszek, Krzysztof
    Gdansk University of Technology.
    A Graph – String Model of Gene Assembly in Ciliates2006Inngår i: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2006, Vol. 10, 521-534 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The ciliates are a family of unicellular organisms that characterize themselves by having two types of nuclei, micro - and macronuclei. During cell mating the genetic material must change from the micronuclei to the macronuclei form. The paper summarises a formal model for this change. The model, which is described in recent works, is based on strings and graphs. It shows that inside the cell complex computational operations have to take place.

  • 35.
    Bartoszek, Krzysztof
    Gdansk University of Technology, Poland.
    A Graph – String Model of Gene Assembly in Ciliates [Grafowo-tekstowy model rekombinacji DNA u orzęsek]2006Inngår i: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2006, 521-534 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The ciliates are a family of unicellular organisms that characterize themselves by having two types of nuclei, micro - and macronuclei. During cell mating the genetic material must change from the micronuclei to the macronuclei form. The paper summarises a formal model for this change. The model, which is described in recent works, is based on strings and graphs. It shows that inside the cell complex computational operations have to take place.

  • 36.
    Bartoszek, Krzysztof
    Gdansk University of Technology.
    The Bootstrap and Other Methods of Testing Phylogenetic Trees2007Inngår i: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2007, Vol. 12, 103-108 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The final step of a phylogenetic analysis is the test of the generated tree. This is not a easy task for which there is an obvious methodology because we do not know the full probabilistic model of evolution. A number of methods have been proposed but there is a wide debate concerning the interpretations of the results they produce.

  • 37.
    Bartoszek, Krzysztof
    Gdansk University of Technology, Poland.
    The Bootstrap and Other Methods of Testing Phylogenetic Trees2007Inngår i: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2007, 103-108 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The final step of a phylogenetic analysis is the test of the generated tree. This is not a easy task for which there is an obvious methodology because we do not know the full probabilistic model of evolution. A number of methods have been proposed but there is a wide debate concerning the interpretations of the results they produce.

  • 38.
    Bartoszek, Krzysztof
    Linköpings universitet, Institutionen för datavetenskap, Statistik och maskininlärning. Linköpings universitet, Filosofiska fakulteten.
    Trait evolution with jumps: illusionary normality2017Inngår i: Proceedings of the XXIII National Conference on Applications of Mathematics in Biology and Medicine, 2017, 23-28 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Phylogenetic comparative methods for real-valued traits usually make use of stochastic process whose trajectories are continuous.This is despite biological intuition that evolution is rather punctuated thangradual. On the other hand, there has been a number of recent proposals of evolutionarymodels with jump components. However, as we are only beginning to understandthe behaviour of branching Ornstein-Uhlenbeck (OU) processes the asymptoticsof branching  OU processes with jumps is an even greater unknown. In thiswork we build up on a previous study concerning OU with jumps evolution on a pure birth tree.We introduce an extinction component and explore via simulations, its effects on the weak convergence of such a process.We furthermore, also use this work to illustrate the simulation and graphic generation possibilitiesof the mvSLOUCH package.

  • 39.
    Bartoszek, Krzysztof
    et al.
    Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg.
    Liò, Pietro
    University of Cambridge.
    Sorathiya, Anil
    University of Cambridge.
    Influenza differentiation and evolution2010Inngår i: Acta Physica Polonica B Proceedings Supplement, 2010, Vol. 3, 417-452 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The aim of the study is to do a very wide analysis of HA, NA and M influenza gene segments to find short nucleotide regions,which differentiate between strains (i.e. H1, H2, ... e.t.c.), hosts, geographic regions, time when sequence was found and combination of time and region using a simple methodology. Finding regions  differentiating between strains has as its goal the construction of a Luminex microarray which will allow quick and efficient strain recognition. Discovery for the other splitting factors could shed lighton structures significant for host specificity and on the history of influenza evolution. A large number of places in the HA, NA and M gene segments were found that can differentiate between hosts, regions, time and combination of time and region. Also very good differentiation between different Hx strains can be seen.We link one of our findings to a proposed stochastic model of creation of viral phylogenetic trees.

  • 40.
    Bartoszek, Krzysztof
    et al.
    Mathematical Statistics, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden.
    Liò, Pietro
    Computer Laboratory, University of Cambridge Cambridge, United Kingdom.
    Sorathiya, Anil
    Computer Laboratory, University of Cambridge Cambridge, United Kingdom.
    Influenza differentiation and evolution2010Inngår i: Acta Physica Polonica B Proceedings Supplement, 2010, Vol. 3, 417-452 s., 2Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The aim of the study is to do a very wide analysis of HA, NA and M influenza gene segments to find short nucleotide regions,which differentiate between strains (i.e. H1, H2, ... e.t.c.), hosts, geographic regions, time when sequence was found and combination of time and region using a simple methodology. Finding regions  differentiating between strains has as its goal the construction of a Luminex microarray which will allow quick and efficient strain recognition. Discovery for the other splitting factors could shed lighton structures significant for host specificity and on the history of influenza evolution. A large number of places in the HA, NA and M gene segments were found that can differentiate between hosts, regions, time and combination of time and region. Also very good differentiation between different Hx strains can be seen.We link one of our findings to a proposed stochastic model of creation of viral phylogenetic trees.

  • 41.
    Bartoszek, Krzysztof
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Matematiska institutionen, Tillämpad matematik och statistik.
    Pietro, Lio'
    Cambridge University.
    A novel algorithm to reconstruct phylogenies using gene sequences and expression data2014Inngår i: International Proceedings of Chemical, Biological & Environmental Engineering; Environment, Energy and Biotechnology III, 2014, 8-12 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Phylogenies based on single loci should be viewed with caution and the best approach for obtaining robust trees is to examine numerous loci across the genome. It often happens that for the same set of species trees derived from different genes are in conflict between each other. There are several methods that combine information from different genes in order to infer the species tree. One novel approach is to use informationfrom different -omics. Here we describe a phylogenetic method based on an Ornstein–Uhlenbeck process that combines sequence and gene expression data. We test our method on genes belonging to the histidine biosynthetic operon. We found that the method provides interesting insights into selection pressures and adaptive hypotheses concerning gene expression levels.

  • 42.
    Bartoszek, Krzysztof
    et al.
    Department of Mathematics, Uppsala University, Uppsala, Sweden.
    Pietro, Lio'
    Computer Laboratory , University of Cambridge, Cambridge, Un ited Kingdom.
    A novel algorithm to reconstruct phylogenies using gene sequences and expression data2014Inngår i: International Proceedings of Chemical, Biological & Environmental Engineering; Environment, Energy and Biotechnology III, 2014, Vol. 70, 8-12 s.Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Phylogenies based on single loci should be viewed with caution and the best approach for obtaining robust trees is to examine numerous loci across the genome. It often happens that for the same set of species trees derived from different genes are in conflict between each other. There are several methods that combine information from different genes in order to infer the species tree. One novel approach is to use informationfrom different -omics. Here we describe a phylogenetic method based on an Ornstein–Uhlenbeck process that combines sequence and gene expression data. We test our method on genes belonging to the histidine biosynthetic operon. We found that the method provides interesting insights into selection pressures and adaptive hypotheses concerning gene expression levels.

  • 43.
    Basile, Walter
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sachenkova, Oxana
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Light, Sara
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). Linköping University, Sweden.
    Elofsson, Arne
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). Kungliga Tekniska Högskolan, Sweden.
    High GC content causes orphan proteins to be intrinsically disordered2017Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, nr 3, e1005375Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

  • 44.
    Basu, Sankar Chandra
    et al.
    Linköpings universitet, Institutionen för fysik, kemi och biologi, Bioinformatik. Linköpings universitet, Tekniska fakulteten.
    Wallner, Björn
    Linköpings universitet, Institutionen för fysik, kemi och biologi, Bioinformatik. Linköpings universitet, Tekniska fakulteten.
    DockQ: A Quality Measure for Protein-Protein Docking Models2016Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 11, nr 8, e0161879- s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The state-of-the-art to assess the structural quality of docking models is currently based on three related yet independent quality measures: F-nat, LRMS, and iRMS as proposed and standardized by CAPRI. These quality measures quantify different aspects of the quality of a particular docking model and need to be viewed together to reveal the true quality, e.g. a model with relatively poor LRMS (amp;gt; 10 angstrom) might still qualify as acceptable with a descent F-nat (amp;gt; 0.50) and iRMS (amp;lt; 3.0 angstrom). This is also the reason why the so called CAPRI criteria for assessing the quality of docking models is defined by applying various ad-hoc cutoffs on these measures to classify a docking model into the four classes: Incorrect, Acceptable, Medium, or High quality. This classification has been useful in CAPRI, but since models are grouped in only four bins it is also rather limiting, making it difficult to rank models, correlate with scoring functions or use it as target function in machine learning algorithms. Here, we present DockQ, a continuous protein-protein docking model quality measure derived by combining F-nat, LRMS, and iRMS to a single score in the range [0, 1] that can be used to assess the quality of protein docking models. By using DockQ on CAPRI models it is possible to almost completely reproduce the original CAPRI classification into Incorrect, Acceptable, Medium and High quality. An average PPV of 94% at 90% Recall demonstrating that there is no need to apply predefined ad-hoc cutoffs to classify docking models. Since DockQ recapitulates the CAPRI classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI classification, making it possible to estimate model quality in a more quantitative way using Z-scores or sum of top ranked models, which has been so valuable for the CASP community. The possibility to directly correlate a quality measure to a scoring function has been crucial for the development of scoring functions for protein structure prediction, and DockQ should be useful in a similar development in the protein docking field.

  • 45.
    Basu, Sankar Chandra
    et al.
    Linköpings universitet, Institutionen för fysik, kemi och biologi, Bioinformatik. Linköpings universitet, Tekniska fakulteten.
    Wallner, Björn
    Linköpings universitet, Institutionen för fysik, kemi och biologi, Bioinformatik. Linköpings universitet, Tekniska fakulteten.
    Finding correct protein-protein docking models using ProQDock2016Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, nr 12, 262-270 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Motivation: Protein-protein interactions are a key in virtually all biological processes. For a detailed understanding of the biological processes, the structure of the protein complex is essential. Given the current experimental techniques for structure determination, the vast majority of all protein complexes will never be solved by experimental techniques. In lack of experimental data, computational docking methods can be used to predict the structure of the protein complex. A common strategy is to generate many alternative docking solutions (atomic models) and then use a scoring function to select the best. The success of the computational docking technique is, to a large degree, dependent on the ability of the scoring function to accurately rank and score the many alternative docking models. Results: Here, we present ProQDock, a scoring function that predicts the absolute quality of docking model measured by a novel protein docking quality score (DockQ). ProQDock uses support vector machines trained to predict the quality of protein docking models using features that can be calculated from the docking model itself. By combining different types of features describing both the protein-protein interface and the overall physical chemistry, it was possible to improve the correlation with DockQ from 0.25 for the best individual feature (electrostatic complementarity) to 0.49 for the final version of ProQDock. ProQDock performed better than the state-of-the-art methods ZRANK and ZRANK2 in terms of correlations, ranking and finding correct models on an independent test set. Finally, we also demonstrate that it is possible to combine ProQDock with ZRANK and ZRANK2 to improve performance even further.

  • 46.
    Bebris, Kristaps
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för ekologi och genetik, Zooekologi. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för biologisk grundutbildning.
    Local adaptation of Grauer's gorilla gut microbiome2017Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    The availability of high-throughput sequencing technologies has enabled metagenomicinvestigations into complex bacterial communities with unprecedented resolution andthroughput. The production of dedicated data sets for metagenomic analyses is, however, acostly process and, frequently, the first research questions focus on the study species itself. Ifthe source material is represented by fecal samples, target capture of host-specific sequencesis applied to enrich the complex DNA mixtures contained within a typical fecal DNA extract.Yet, even after this enrichment, the samples still contain a large amount of environmentalDNA that is usually left unanalysed. In my study I investigate the possibility of using shotgunsequencing data that has been subjected to target enrichment for mtDNA from the hostspecies, Grauer’s gorilla (Gorilla beringei graueri), for further analysis of the microbialcommunity present in these samples. The purpose of these analyses is to study the differencesin the bacterial communities present within a high-altitude Grauer’s gorilla, low-altitudeGrauer’s gorilla, and a sympatric chimpanzee population. Additionally, I explore the adaptivepotential of the gut microbiota within these great ape populations.I evaluated the impact that the enrichment process had on the microbial community by usingpre- and post-capture museum preserved samples. In addition to this, I also analysed the effectof two different extraction methods on the bacterial communities.My results show that the relative abundances of the bacterial taxa remain relatively unaffectedby the enrichment process and the extraction methods. The overall number of taxa is,however, reduced by each additional capture round and is not consistent between theextraction methods. This means that both the enrichment and extraction processes introducebiases that require the usage of abundance-based distance measures for biological inferences.Additionally, even if the data cannot be used to study the bacterial communities in anunbiased manner, it provides useful comparative insights for samples that were treated in thesame fashion.With this background, I used museum and fecal samples to perform cluster analysis to explorethe relationships between the gut microbiota of the three great ape populations. I found thatpopulations cluster by species first, and only then group according to habitat. I further foundthat a bacterial taxon that degrades plant matter is enriched in the gut microbiota of all threegreat ape species, where it could help with the digestion of vegetative foods. Another bacterialtaxon that consumes glucose is enriched in the gut microbiota of the low-altitude gorilla andchimpanzee populations, where it could help with the modulation of the host’s mucosalimmune system, and could point to the availability of fruit in the animals diet. In addition, Ifound a bacterial taxon that is linked with diarrhea in humans to be part of the gut microbiotaof the habituated high-altitude gorilla population, which could indicate that this pathogen hasbeen transmitted to the gorillas from their interaction with humans, or it could be indicative ofthe presence of a contaminated water source.

  • 47.
    Bekkouche, Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST). KTH.
    Classification of Neuronal Subtypes in the Striatum and the Effect of Neuronal Heterogeneity on the Activity Dynamics2016Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
    Abstract [en]

    Clustering of single-cell RNA sequencing data is often used to show what states and subtypes cells have. Using this technique, striatal cells were clustered into subtypes using different clustering algorithms. Previously known subtypes were confirmed and new subtypes were found. One of them is a third medium spiny neuron subtype. Using the observed heterogeneity, as a second task, this project questions whether or not differences in individual neurons have an impact on the network dynamics. By clustering spiking activity from a neural network model, inconclusive results were found. Both algorithms indicating low heterogeneity, but by altering the quantity of a subtype between a low and high number, and clustering the network activity in each case, results indicate that there is an increase in the heterogeneity. This project shows a list of potential striatal subtypes and gives reasons to keep giving attention to biologically observed heterogeneity.

  • 48. Bem, T.
    et al.
    Cabelguen, J. M.
    Ekeberg, Örjan
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Grillner, S.
    From swimming to walking: a single basic network for two different behaviors2003Inngår i: Biological cybernetics, ISSN 0340-1200, E-ISSN 1432-0770, Vol. 88, nr 2, 79-90 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper we consider the hypothesis that the spinal locomotor network controlling trunk movements has remained essentially unchanged during the evolutionary transition from aquatic to terrestrial locomotion. The wider repertoire of axial motor patterns expressed by amphibians would then be explained by the influence from separate limb pattern generators, added during this evolution. This study is based on EMG data recorded in vivo from epaxial musculature in the newt Pleurodeles waltl during unrestrained swimming and walking, and on a simplified model of the lamprey spinal pattern generator for swimming. Using computer simulations, we have examined the output generated by the lamprey model network for different input drives. Two distinct inputs were identified which reproduced the main features of the swimming and walking motor patterns in the newt. The swimming pattern is generated when the network receives tonic excitation with local intensity gradients near the neck and girdle regions. To produce the walking pattern, the network must receive (in addition to a tonic excitation at the girdles) a phasic drive which is out of phase in the neck and tail regions in relation to the middle part of the body. To fit the symmetry of the walking pattern, however, the intersegmental connectivity of the network had to be modified by reversing the direction of the crossed inhibitory pathways in the rostral part of the spinal cord. This study suggests that the 'input drive required for the generation of the distinct walking pattern could, at least partly, be attributed to mechanosensory feedback received by the network directly from the intraspinal stretch-receptor system. Indeed, the input drive required resembles the pattern of activity of stretch receptors sensing the lateral bending of the trunk, as expressed during walking in urodeles. Moreover, our results indicate that a nonuniform distribution of these stretch receptors along the trunk can explain the discontinuities exhibited in the swimming pattern of the newt. Thus, original network controlling axial movements not only through a direct coupling at the central level but also via a mechanical coupling between trunk and limbs, which in turn influences the sensory signals sent back to the network. Taken together, our findings support the hypothesis of a phylogenetic conservatism of the spinal locomotor networks generating axial motor patterns from agnathans to amphibians.

  • 49.
    Benjaminsson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Fransson, Peter
    Department of Clinical Neuroscience, Karolinska Institute.
    Lansner, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    A Novel Model-Free Data Analysis Technique Based on Clustering in a Mutual Information Space: Application to Resting-State fMRI2010Inngår i: Frontiers in Systems Neuroscience, ISSN 1662-5137, Vol. 4, 34:1-34:8 s.Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Non-parametric data-driven analysis techniques can be used to study datasets with few assumptions about the data and underlying experiment. Variations of independent component analysis (ICA) have been the methods mostly used on fMRI data, e.g., in finding resting-state networks thought to reflect the connectivity of the brain. Here we present a novel data analysis technique and demonstrate it on resting-state fMRI data. It is a generic method with few underlying assumptions about the data. The results are built from the statistical relations between all input voxels, resulting in a whole-brain analysis on a voxel level. It has good scalability properties and the parallel implementation is capable of handling large datasets and databases. From the mutual information between the activities of the voxels over time, a distance matrix is created for all voxels in the input space. Multidimensional scaling is used to put the voxels in a lower-dimensional space reflecting the dependency relations based on the distance matrix. By performing clustering in this space we can find the strong statistical regularities in the data, which for the resting-state data turns out to be the resting-state networks. The decomposition is performed in the last step of the algorithm and is computationally simple. This opens up for rapid analysis and visualization of the data on different spatial levels, as well as automatically finding a suitable number of decomposition components.

  • 50.
    Benjaminsson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Herman, Pawel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lansner, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Odour discrimination and mixture segmentation in a holistic model of the mammalian olfactory systemManuskript (preprint) (Annet vitenskapelig)
1234567 1 - 50 of 619
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf