Change search
Refine search result
1234 1 - 50 of 175
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Qiu, Xuanbin
    KTH, School of Computer Science and Communication (CSC).
    The, Matthew
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 4, p. 508-513Article in journal (Refereed)
    Abstract [en]

    Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

  • 2.
    Ameur, Adam
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Yankovski, Vladimir
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    The LCB Data Warehouse2006In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 22, no 8, p. 1024-1026Article in journal (Refereed)
    Abstract [en]

    The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.

  • 3.
    Andersson, Anders
    et al.
    KTH, School of Biotechnology (BIO).
    Bernander, R.
    Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University.
    Nilsson, Peter
    KTH, School of Biotechnology (BIO).
    Dual-genome primer design for construction of DNA microarrays2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no 3, p. 325-332Article in journal (Refereed)
    Abstract [en]

    Motivation: Microarray experiments using probes covering a whole transcriptome are expensive to initiate, and a major part of the costs derives from synthesizing gene-specific PCR primers or hybridization probes. The high costs may force researchers to limit their studies to a single organism, although comparing gene expression in different species would yield valuable information. Results: We have developed a method, implemented in the software DualPrime, that reduces the number of primers required to amplify the genes of two different genomes. The software identifies regions of high sequence similarity, and from these regions selects PCR primers shared between the genomes, such that either one or, preferentially, both primers in a given PCR can be used for amplification from both genomes. To assure high microarray probe specificity, the software selects primer pairs that generate products of low sequence similarity to other genes within the same genome. We used the software to design PCR primers for 2182 and 1960 genes from the hyperthermophilic archaea Sulfolobus solfataricus and Sulfolobus acidocaldarius, respectively. Primer pairs were shared among 705 pairs of genes, and single primers were shared among 1184 pairs of genes, resulting in a saving of 31% compared to using only unique primers. We also present an alternative primer design method, in which each gene shares primers with two different genes of the other genome, enabling further savings.

  • 4.
    Andersson, Anders
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    Bernander, Rolf
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    Nilsson, Peter
    Dual-genome primer design for construction of DNA microarrays2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no 3, p. 325-332Article in journal (Refereed)
    Abstract [en]

    Motivation: Microarray experiments using probes covering a whole transcriptome are expensive to initiate, and a major part of the costs derives from synthesizing gene-specific PCR primers or hybridization probes. The high costs may force researchers to limit their studies to a single organism, although comparing gene expression in different species would yield valuable information.

    Results: We have developed a method, implemented in the software DualPrime, that reduces the number of primers required to amplify the genes of two different genomes. The software identifies regions of high sequence similarity, and from these regions selects PCR primers shared between the genomes, such that either one or, preferentially, both primers in a given PCR can be used for amplification from both genomes. To assure high microarray probe specificity, the software selects primer pairs that generate products of low sequence similarity to other genes within the same genome. We used the software to design PCR primers for 2182 and 1960 genes from the hyperthermophilic archaea Sulfolobus solfataricus and Sulfolobus acidocaldarius, respectively. Primer pairs were shared among 705 pairs of genes, and single primers were shared among 1184 pairs of genes, resulting in a saving of 31% compared to using only unique primers. We also present an alternative primer design method, in which each gene shares primers with two different genes of the other genome, enabling further savings.

  • 5.
    Andersson, Robin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Bruder, Carl E G
    Piotrowski, Arkadiusz
    Menzel, Uwe
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
    Nord, Helena
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
    Sandgren, Johanna
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Surgical Sciences.
    Hvidsten, Torgeir R
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    de Ståhl, Teresita Diaz
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
    Dumanski, Jan P
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    A Segmental Maximum A Posteriori Approach to Genome-wide Copy Number Profiling2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 6, p. 751-758Article in journal (Other academic)
    Abstract [en]

    MOTIVATION: Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis. RESULTS: We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.

  • 6. Andersson, Siv G E
    et al.
    Alsmark, Cecilia
    Canbäck, Björn
    Davids, Wagied
    Frank, Carolin
    Karlberg, Olof
    Klasson, Lisa
    Antoine-Legault, Boris
    Mira, Alex
    Tamas, Ivica
    Comparative genomics of microbial pathogens and symbionts.2002In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 18 Suppl 2, p. S17-Article in journal (Refereed)
    Abstract [en]

    We are interested in quantifying the contribution of gene acquisition, loss, expansion and rearrangements to the evolution of microbial genomes. Here, we discuss factors influencing microbial genome divergence based on pair-wise genome comparisons of closely related strains and species with different lifestyles. A particular focus is on intracellular pathogens and symbionts of the genera Rickettsia, Bartonella and BUCHNERA: Extensive gene loss and restricted access to phage and plasmid pools may provide an explanation for why single host pathogens are normally less successful than multihost pathogens. We note that species-specific genes tend to be shorter than orthologous genes, suggesting that a fraction of these may represent fossil-orfs, as also supported by multiple sequence alignments among species. The results of our genome comparisons are placed in the context of phylogenomic analyses of alpha and gamma proteobacteria. We highlight artefacts caused by different rates and patterns of mutations, suggesting that atypical phylogenetic placements can not a priori be taken as evidence for horizontal gene transfer events. The flexibility in genome structure among free-living microbes contrasts with the extreme stability observed for the small genomes of aphid endosymbionts, in which no rearrangements or inflow of genetic material have occurred during the past 50 millions years (1). Taken together, the results suggest that genomic stability correlate with the content of repeated sequences and mobile genetic elements, and thereby indirectly with bacterial lifestyles.

  • 7.
    Anil, Anandashankar
    et al.
    KTH, School of Biotechnology (BIO). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Spalinskas, Rapolas
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Biotechnology (BIO).
    Åkerborg, Örjan
    KTH, School of Biotechnology (BIO). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sahlén, Pelin
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Biotechnology (BIO).
    HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 4, p. 675-677Article in journal (Refereed)
    Abstract [en]

    Folding of eukaryotic genomes within nuclear space enables physical and functional contacts between regions that are otherwise kilobases away in sequence space. Targeted chromosome conformation capture methods (T2C, chi-C and HiCap) are capable of informing genomic contacts for a subset of regions targeted by probes. We here present HiCapTools, a software package that can design sequence capture probes for targeted chromosome capture applications and analyse sequencing output to detect proximities involving targeted fragments. Two probes are designed for each feature while avoiding repeat elements and non-unique regions. The data analysis suite processes alignment files to report genomic proximities for each feature at restriction fragment level and is isoform-aware for gene features. Statistical significance of contact frequencies is evaluated using an empirically derived background distribution. Targeted chromosome conformation capture applications are invaluable for locating target genes of disease-associated variants found by genome-wide association studies. Hence, we believe our software suite will prove to be useful for a wider user base within clinical and functional applications.

  • 8.
    Arvestad, Lars
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Berglund, Ann-Charlotte
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Sennblad, Bengt
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Bayesian gene/species tree reconciliation and orthology analysis using MCMC2003In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 19, p. i7-i15Article in journal (Refereed)
    Abstract [en]

    Motivation: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available.

    Results: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves ‘inside’ a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch’s original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

  • 9.
    Basu, Sankar Chandra
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
    Wallner, Björn
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
    Finding correct protein-protein docking models using ProQDock2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 12, p. 262-270Article in journal (Refereed)
    Abstract [en]

    Motivation: Protein-protein interactions are a key in virtually all biological processes. For a detailed understanding of the biological processes, the structure of the protein complex is essential. Given the current experimental techniques for structure determination, the vast majority of all protein complexes will never be solved by experimental techniques. In lack of experimental data, computational docking methods can be used to predict the structure of the protein complex. A common strategy is to generate many alternative docking solutions (atomic models) and then use a scoring function to select the best. The success of the computational docking technique is, to a large degree, dependent on the ability of the scoring function to accurately rank and score the many alternative docking models. Results: Here, we present ProQDock, a scoring function that predicts the absolute quality of docking model measured by a novel protein docking quality score (DockQ). ProQDock uses support vector machines trained to predict the quality of protein docking models using features that can be calculated from the docking model itself. By combining different types of features describing both the protein-protein interface and the overall physical chemistry, it was possible to improve the correlation with DockQ from 0.25 for the best individual feature (electrostatic complementarity) to 0.49 for the final version of ProQDock. ProQDock performed better than the state-of-the-art methods ZRANK and ZRANK2 in terms of correlations, ranking and finding correct models on an independent test set. Finally, we also demonstrate that it is possible to combine ProQDock with ZRANK and ZRANK2 to improve performance even further.

  • 10.
    Bernhem, Kristoffer
    et al.
    KTH, School of Engineering Sciences (SCI), Applied Physics. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Brismar, Hjalmar
    KTH, School of Engineering Sciences (SCI), Applied Physics. KTH, Centres, Science for Life Laboratory, SciLifeLab. Karolinska Institutet, Sweden.
    SMLocalizer, a GPU accelerated ImageJ plugin for single molecule localization microscopy2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 1, p. 137-Article in journal (Refereed)
    Abstract [en]

    SMLocalizer combines the availability of ImageJ with the power of GPU processing for fast and accurate analysis of single molecule localization microscopy data. Analysis of 2D and 3D data in multiple channels is supported.

  • 11.
    Björkholm, Patrik
    et al.
    The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala.
    Daniluk, Pawel
    Department of Biophysics, Faculty of Physics, University of Warsaw, Warsaw, Poland.
    Kryshtafovych, Andriy
    UC Davis Genome Centre, UC Davis, USA.
    Fidelis, Krzysztof
    UC Davis Genome Centre, UC Davis, USA.
    Andersson, Robin
    The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala.
    Hvidsten, Torgeir
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts.2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 10, p. 1264-1270Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. RESULTS: We propose a novel hidden Markov model (HMM)-based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 x L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.

  • 12.
    Björkholm, Patrik
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Daniluk, Pawel
    Kryshtafovych, Andriy
    Fidelis, Krzysztof
    Andersson, Robin
    Hvidsten, Torgeir R.
    Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 10, p. 1264-1270Article in journal (Refereed)
    Abstract [en]

    Motivation: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. Results: We propose a novel hidden Markov model (HMM)based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 . L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short- range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.

  • 13.
    Björkholm, Patrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Daniluk, Pawel
    Kryshtafovych, Andriy
    Fidelis, Krzysztof
    Andersson, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Hvidsten, Torgeir R.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 10, p. 1264-1270Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. RESULTS: We propose a novel hidden Markov model based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary struc-ture and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities in-corporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 L predictions (L = sequence length), our hidden Markov models obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.

  • 14. Brameier, Markus
    et al.
    Krings, Andrea
    Stockholm University.
    MacCallum, Robert M.
    NucPred - Predicting nuclear localization of proteins2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 9, p. 1159-1160Article in journal (Refereed)
    Abstract [en]

    NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression matching and multiple program classifiers induced by genetic programming. A likelihood score is derived from the programs for each input sequence and each residue position. Different forms of visualization are provided to assist the detection of nuclear localization signals (NLSs). The NucPred server also provides access to additional sources of biological information (real and predicted) for a better validation and interpretation of results.

  • 15.
    Bylesjö, Max
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjödin, Andreas
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Eriksson, Daniel
    Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Antti, Henrik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Moritz, Thomas
    Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Jansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    MASQOT-GUI: spot quality assessment for the two-channel microarray platform2006In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 22, no 20, p. 2554-2555Article in journal (Refereed)
    Abstract [en]

    MASQOT-GUI provides an open-source, platform-independent software pipeline for two-channel microarray spot quality control. This includes gridding, segmentation, quantification, quality assessment and data visualization. It hosts a set of independent applications, with interactions between the tools as well as import and export support for external software. The implementation of automated multivariate quality control assessment, which is a unique feature of MASQOT-GUI, is based on the previously documented and evaluated MASQOT methodology. Further abilities of the application are outlined and illustrated. AVAILABILITY: MASQOT-GUI is Java-based and licensed under the GNU LGPL. Source code and installation files are available for download at http://masqot-gui.sourceforge.net/

  • 16.
    Bystry, Vojtech
    et al.
    Masaryk Univ, CEITEC Cent European Inst Technol, Brno, Czech Republic..
    Agathangelidis, Andreas
    IRCCS San Raffaele Sci Inst, Div Mol Oncol, Milan, Italy.;IRCCS San Raffaele Sci Inst, Dept Oncohematol, Milan, Italy.;Univ Vita Salute San Raffaele, Milan, Italy..
    Bikos, Vasilis
    Masaryk Univ, CEITEC Cent European Inst Technol, Brno, Czech Republic..
    Sutton, Lesley Ann
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
    Baliakas, Panagiotis
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology.
    Hadzidimitriou, Anastasia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology. Ctr Res & Technol Hellas, Inst Appl Biosci, Thessaloniki, Greece..
    Stamatopoulos, Kostas
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology. Ctr Res & Technol Hellas, Inst Appl Biosci, Thessaloniki, Greece..
    Darzentas, Nikos
    Masaryk Univ, CEITEC Cent European Inst Technol, Brno, Czech Republic..
    ARResT/AssignSubsets: a novel application for robust subclassification of chronic lymphocytic leukemia based on B cell receptor IG stereotypy2015In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 31, no 23, p. 3844-3846Article in journal (Refereed)
    Abstract [en]

    Motivation: An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for similar to 30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. Results: We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution.

  • 17.
    Carlborg, Örjan
    et al.
    Roslin Institute.
    De Koning, D J
    Manly, K F
    Chesler, E
    Williams, R W
    Haley, C S
    Methodological aspects of the genetic dissection of gene expression.2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no 10Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QLT mapping methods are not tailored for the highly automated analyses required to deal with the thousand of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression.

    RESULTS: The analyses of expression data on > 12,000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5% threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether > 300,000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.

    AVAILABILITY: The software used for this study is available on request from the authors.

  • 18. Caulfield, Emmet
    et al.
    Hellander, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    CellMC: a multiplatform model compiler for the Cell Broadband Engine and x862010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, p. 426-428Article in journal (Refereed)
  • 19.
    Climer, Sharlee
    et al.
    School of Medicine, Washington University, United States.
    Jäger, Gerold
    Computer Science Institute, University of Halle-Wittenberg, Germany.
    Templeton, Alan R
    Department of Biology, Washington University, United States.
    Zhang, Weixiong
    Department of Computer Science/Department of Genetics, Washington University, United States.
    How frugal is mother nature with haplotypes?2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 1, p. 68-74Article in journal (Refereed)
    Abstract [en]

    Motivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution.

    Results: This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.

  • 20. Conley, Christopher J.
    et al.
    Smith, Rob
    Torgrip, Ralf
    Stockholm University, Faculty of Science, Department of Analytical Chemistry.
    Taylor, Ryan M.
    Tautenhahn, Ralf
    Prince, John T.
    Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2636-2643Article in journal (Refereed)
    Abstract [en]

    Motivation: Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrometry (XC-MS) data analysis that faces a multitude of technical challenges on complex samples. The Kalman filter (KF) application to IT detection addresses some of these challenges; it discriminates closely eluting ITs in the m/z dimension, flexibly handles heteroscedastic m/z variances and does not bin the m/z axis. Yet, the behavior of this KF application has not been fully characterized, as no cost-free open-source implementation exists and incomplete evaluation standards for IT detection persist.

    Results: Massifquant is an open-source solution for KF IT detection that has been subjected to novel and rigorous methods of performance evaluation. The presented evaluation with accompanying annotations and optimization guide sets a new standard for comparative IT detection. Compared with centWave, matchedFilter and MZMine2-alternative IT detection engines-Massifquant detected more true ITs in a real LC-MS complex sample, especially low-intensity ITs. It also offers competitive specificity and equally effective quantitation accuracy.

  • 21.
    Dalevi, Daniel
    et al.
    Department of Computing Science and Engineering, Chalmers University of Technology, Gothenburg.
    Eriksen, Niklas
    Department of Mathematical Sciences, Gothenburg University and Chalmers University of Technology, Gotenhburg.
    Expected Gene Order Distances and Model Selection in Bacteria2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 11, p. 1332-1338Article in journal (Refereed)
    Abstract [en]

    Motivation: The evolutionary distance inferred from gene order comparisons of related bacteria is dependent on the model. Therefore, it is highly important to establish reliable assumptions before inferring its magnitude.

    Results: We investigate the patterns of dotplots between species of bacteria with the purpose of model selection in gene order problems. We find several categories of data which can be explained by carefully weighing the contributions of reversals, transpositions, symmetrical reversals, single gene transpositions, and single gene reversals. We also derive method of moments distance estimates for some previously uncomputed cases, such as symmetrical reversals, single gene reversals and their combinations, as well as the single gene transpositions edit distance.

  • 22. Das, Sarbashis
    et al.
    Vishnoi, Anchal
    Bhattacharya, Alok
    ABWGAT: anchor-based whole genome analysis tool.2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 24, p. 3319-20Article in journal (Refereed)
    Abstract [en]

    SUMMARY: Large numbers of genomes are being sequenced regularly and the rate will go up in future due to availability of new genome sequencing techniques. In order to understand genotype to phenotype relationships, it is necessary to identify sequence variations at the genomic level. Alignment of a pair of genomes and parsing the alignment data is an accepted approach for identification of variations. Though there are a number of tools available for whole-genome alignment, none of these allows automatic parsing of the alignment and identification of different kinds of genomic variants with high degree of sensitivity. Here we present a simple web-based interface for whole genome comparison named ABWGAT (Anchor-Based Whole Genome Analysis Tool) that is simple to use. The output is a list of variations such as SNVs, indels, repeat expansion and inversion.

    AVAILABILITY: The web server is freely available to non-commercial users at the following address http://abwgc.jnu.ac.in/_sarba. Supplementary data are available at http://abwgc.jnu.ac.in/_sarba/cgi-bin/abwgc_retrival.cgi using job id 524, 526 and 528.

    CONTACT: dsarbashis@gmail.com; alok.bhattacharya@gmail.com

  • 23. Delhomme, Nicolas
    et al.
    Padioleau, Ismaël
    Furlong, Eileen E
    Steinmetz, Lars M
    easyRNASeq: a bioconductor package for processing RNA-Seq data.2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 19Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: RNA sequencing is becoming a standard for expression profiling experiments and many tools have been developed in the past few years to analyze RNA-Seq data. Numerous 'Bioconductor' packages are available for next-generation sequencing data loading in R, e.g. ShortRead and Rsamtools as well as to perform differential gene expression analyses, e.g. DESeq and edgeR. However, the processing tasks lying in between these require the precise interplay of many Bioconductor packages, e.g. Biostrings, IRanges or external solutions are to be sought.

    RESULTS: We developed 'easyRNASeq', an R package that simplifies the processing of RNA sequencing data, hiding the complex interplay of the required packages behind a single functionality.

    AVAILABILITY: The package is implemented in R (as of version 2.15) and is available from Bioconductor (as of version 2.10) at the URL: http://bioconductor.org/packages/release/bioc/html/easyRNASeq.html, where installation and usage instructions can be found.

    CONTACT: delhomme@embl.de.

  • 24.
    Demissie, Meaza
    et al.
    Örebro University, Swedish Business School at Örebro University.
    Mascialino, Barbara
    Calza, Stefano
    Pawitan, Yudi
    Unequal group variances in microarray data analyses2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 9, p. 1168-1174Article in journal (Refereed)
    Abstract [en]

    Motivation: In searching for differentially expressed (DE) genes in microarray data, we often observe a fraction of the genes to have unequal variability between groups. This is not an issue in large samples, where a valid test exists that uses individual variances separately. The problem arises in the small-sample setting, where the approximately valid Welch test lacks sensitivity, while the more sensitive moderated t-test assumes equal variance. Methods: We introduce a moderated Welch test (MWT) that allows unequal variance between groups. It is based on (i) weighting of pooled and unpooled standard errors and (ii) improved estimation of the gene-level variance that exploits the information from across the genes. Results: When a non-trivial proportion of genes has unequal variability, false discovery rate (FDR) estimates based on the standard t and moderated t-tests are often too optimistic, while the standard Welch test has low sensitivity. The MWT is shown to (i) perform better than the standard t, the standard Welch and the moderated t-tests when the variances are unequal between groups and (ii) perform similarly to the moderated t, and better than the standard t and Welch tests when the group variances are equal. These results mean that MWT is more reliable than other existing tests over wider range of data conditions. Availability: R package to perform MWT is available at http://www.meb.ki.se/similar to yudpaw Contact: yudi.pawitan@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.

  • 25. Dimou, Niki L.
    et al.
    Tsirigos, Konstantinos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Bagos, Pantelis G.
    GWAR: robust analysis and meta-analysis of genome-wide association studies2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 10, p. 1521-1527Article in journal (Refereed)
    Abstract [en]

    Motivation: In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. Results: The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata.

  • 26. Draminski, Michal
    et al.
    Rada-Iglesias, Alvaro
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Wadelius, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
    Koronacki, Jacek
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Monte Carlo feature selection for supervised classification2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 1, p. 110-117Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods.

  • 27. Duchemin, Wandrille
    et al.
    Gence, Guillaume
    Chifolleau, Anne-Muriel Arigon
    Arvestad, Lars
    Stockholm University, Faculty of Science, Department of Mathematics. Swedish e-Science Research Centre (SeRC), Sweden.
    Bansal, Mukul S.
    Berry, Vincent
    Boussau, Bastien
    Chevenet, Francois
    Comte, Nicolas
    Davin, Adrian A.
    Dessimoz, Christophe
    Dylus, David
    Hasic, Damir
    Mallo, Diego
    Planel, Remi
    Posada, David
    Scornavacca, Celine
    Szollosi, Gergely
    Zhang, Louxin
    Tannier, Eric
    Daubin, Vincent
    RecPhyloXML: a format for reconciled gene trees2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 21, p. 3646-3652Article in journal (Refereed)
    Abstract [en]

    Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc. -along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities.

  • 28.
    Elo, Laura L.
    et al.
    Department of Mathematics, University of Turku, Turku, Finland; Turku Centre for Biotechnology, Turku, Finland.
    Järvenpää, Henna
    Turku Centre for Biotechnology, Turku, Finland.
    Oresic, Matej
    Turku Centre for Biotechnology, Turku, Finland; VTT Biotechnology, Espoo, Finland.
    Lahesmaa, Riitta
    Turku Centre for Biotechnology, Turku, Finland.
    Aittokallio, Tero
    Department of Mathematics, University of Turku, Turku, Finland; Turku Centre for Biotechnology, Turku, Finland; Systems Biology Unit, Institut Pasteur, Paris, France.
    Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 16, p. 2096-2103Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Coexpression networks have recently emerged as a novel holistic approach to microarray data analysis and interpretation. Choosing an appropriate cutoff threshold, above which a gene-gene interaction is considered as relevant, is a critical task in most network-centric applications, especially when two or more networks are being compared.

    RESULTS: We demonstrate that the performance of traditional approaches, which are based on a pre-defined cutoff or significance level, can vary drastically depending on the type of data and application. Therefore, we introduce a systematic procedure for estimating a cutoff threshold of coexpression networks directly from their topological properties. Both synthetic and real datasets show clear benefits of our data-driven approach under various practical circumstances. In particular, the procedure provides a robust estimate of individual degree distributions, even from multiple microarray studies performed with different array platforms or experimental designs, which can be used to discriminate the corresponding phenotypes. Application to human T helper cell differentiation process provides useful insights into the components and interactions controlling this process, many of which would have remained unidentified on the basis of expression change alone. Moreover, several human-mouse orthologs showed conserved topological changes in both systems, suggesting their potential importance in the differentiation process.

    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • 29.
    Emami Khoonsari, Payam
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Moreno, Pablo
    Bergmann, Sven
    Burman, Joachim
    Capuccini, Marco
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Carone, Matteo
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Cascante, Marta
    de Atauri, Pedro
    Foguet, Carles
    Gonzalez-Beltran, Alejandra N.
    Hankemeier, Thomas
    Haug, Kenneth
    He, Sijin
    Herman, Stephanie
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Johnson, David
    Kale, Namrata
    Larsson, Anders
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Neumann, Steffen
    Peters, Kristian
    Pireddu, Luca
    Rocca-Serra, Philippe
    Roger, Pierrick
    Rueedi, Rico
    Ruttkies, Christoph
    Sadawi, Noureddin
    Salek, Reza M.
    Sansone, Susanna-Assunta
    Schober, Daniel
    Selivanov, Vitaly
    Thévenot, Etienne A.
    van Vliet, Michael
    Zanetti, Gianluigi
    Steinbeck, Christoph
    Kultima, Kim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Interoperable and scalable data analysis with microservices: Applications in metabolomics2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 19, p. 3752-3760Article in journal (Refereed)
  • 30.
    Eriksson, Olivia
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Jauhiainen, Alexandra
    AstraZeneca, IMED Biotech Unit, Early Clin Dev, Biometr, Gothenburg, Sweden..
    Sasane, Sara Maad
    Lund Univ, Ctr Math Sci, Lund, Sweden..
    Kramer, Andrei
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Nair, Anu G.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sartorius, Carolina
    Lund Univ, Ctr Math Sci, Lund, Sweden..
    Hellgren Kotaleski, Jeanette
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Uncertainty quantification, propagation and characterization by Bayesian analysis combined with global sensitivity analysis applied to dynamical intracellular pathway models2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 2, p. 284-292Article in journal (Refereed)
    Abstract [en]

    Motivation: Dynamical models describing intracellular phenomena are increasing in size and complexity as more information is obtained from experiments. These models are often over-parameterized with respect to the quantitative data used for parameter estimation, resulting in uncertainty in the individual parameter estimates as well as in the predictions made from the model. Here we combine Bayesian analysis with global sensitivity analysis (GSA) in order to give better informed predictions; to point out weaker parts of the model that are important targets for further experiments, as well as to give guidance on parameters that are essential in distinguishing different qualitative output behaviours. Results: We used approximate Bayesian computation (ABC) to estimate the model parameters from experimental data, as well as to quantify the uncertainty in this estimation (inverse uncertainty quantification), resulting in a posterior distribution for the parameters. This parameter uncertainty was next propagated to a corresponding uncertainty in the predictions (forward uncertainty propagation), and a GSA was performed on the predictions using the posterior distribution as the possible values for the parameters. This methodology was applied on a relatively large model relevant for synaptic plasticity, using experimental data from several sources. We could hereby point out those parameters that by themselves have the largest contribution to the uncertainty of the prediction as well as identify parameters important to separate between qualitatively different predictions. This approach is useful both for experimental design as well as model building.

  • 31.
    Eriksson, Olivia
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). KTH Royal Institute of Technology, Sweden; Swedish e-Science Research Centre (SeRC), Sweden.
    Jauhiainen, Alexandra
    Sasane, Sara Maad
    Kramer, Andrei
    Nair, Anu G.
    Sartorius, Carolina
    Hellgren Kotaleski, Jeanette
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). KTH Royal Institute of Technology, Sweden; Swedish e-Science Research Centre (SeRC), Sweden.
    Uncertainty quantification, propagation and characterization by Bayesian analysis combined with global sensitivity analysis applied to dynamical intracellular pathway models2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 2, p. 284-292Article in journal (Refereed)
    Abstract [en]

    Motivation: Dynamical models describing intracellular phenomena are increasing in size and complexity as more information is obtained from experiments. These models are often over-parameterized with respect to the quantitative data used for parameter estimation, resulting in uncertainty in the individual parameter estimates as well as in the predictions made from the model. Here we combine Bayesian analysis with global sensitivity analysis (GSA) in order to give better informed predictions; to point out weaker parts of the model that are important targets for further experiments, as well as to give guidance on parameters that are essential in distinguishing different qualitative output behaviours.

    Results: We used approximate Bayesian computation (ABC) to estimate the model parameters from experimental data, as well as to quantify the uncertainty in this estimation (inverse uncertainty quantification), resulting in a posterior distribution for the parameters. This parameter uncertainty was next propagated to a corresponding uncertainty in the predictions (forward uncertainty propagation), and a GSA was performed on the predictions using the posterior distribution as the possible values for the parameters. This methodology was applied on a relatively large model relevant for synaptic plasticity, using experimental data from several sources. We could hereby point out those parameters that by themselves have the largest contribution to the uncertainty of the prediction as well as identify parameters important to separate between qualitatively different predictions. This approach is useful both for experimental design as well as model building.

  • 32. Ewels, P.
    et al.
    Magnusson, M.
    Lundin, S.
    Käller, Max
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    MultiQC: Summarize analysis results for multiple tools and samples in a single report2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 19, p. 3047-3048Article in journal (Refereed)
    Abstract [en]

    Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. Availability and implementation: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info.

  • 33.
    Ewels, Philip
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Magnusson, Måns
    Lundin, Sverker
    Käller, Max
    MultiQC: summarize analysis results for multiple tools and samples in a single report2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 19, p. 3047-3048Article in journal (Refereed)
    Abstract [en]

    Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis.

    Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization.

  • 34.
    Fange, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Fange, David
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Elf, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    MesoRD 1.0: Stochastic reaction-diffusion simulations in the microscopic limit2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811Article in journal (Refereed)
  • 35.
    Fange, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Mahmutovic, Anel
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Elf, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    MesoRD 1.0: Stochastic reaction-diffusion simulations in the microscopic limit2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 23, p. 3155-3157Article in journal (Refereed)
    Abstract [en]

    MesoRD is a tool for simulating stochastic reaction-diffusion systems as modeled by the reaction diffusion master equation. The simulated systems are defined in the Systems Biology Markup Language with additions to define compartment geometries. MesoRD 1.0 supports scale-dependent reaction rate constants and reactions between reactants in neighbouring subvolumes. These new features make it possible to construct physically consistent models of diffusion-controlled reactions also at fine spatial discretization.

  • 36.
    Fernandez Navarro, Jose
    et al.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lundeberg, Joakim
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Ståhl, Patrik L.
    KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    ST viewer: a tool for analysis and visualization of spatial transcriptomics datasets2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 6, p. 1058-1060Article in journal (Refereed)
    Abstract [en]

    Motivation Spatial Transcriptomics (ST) is a technique that combines high-resolution imaging with spatially resolved transcriptome-wide sequencing. This novel type of data opens up many possibilities for analysis and visualization, most of which are either not available with standard tools or too complex for normal users. Results Here, we present a tool, ST Viewer, which allows real-time interaction, analysis and visualization of Spatial Transcriptomics datasets through a seamless and smooth user interface. Availability and implementation The ST Viewer is open source under a MIT license and it is available at https://github.com/SpatialTranscriptomicsResearch/st_viewer. Supplementary information Supplementary data are available at Bioinformatics online.

  • 37. Forslund, Kristoffer
    et al.
    Pereira, Cecile
    Capella-Gutierrez, Salvador
    Sousa da Silva, Alan
    Altenhoff, Adrian
    Huerta-Cepas, Jaime
    Muffato, Matthieu
    Patricio, Mateus
    Vandepoele, Klaas
    Ebersberger, Ingo
    Blake, Judith
    Fernandez Breis, Jesualdo Tomas
    Boeckmann, Brigitte
    Gabaldon, Toni
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Lewis, Suzanna
    Gearing up to handle the mosaic nature of life in the quest for orthologs2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 2, p. 323-329Article in journal (Refereed)
    Abstract [en]

    The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

  • 38.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Benchmarking homology detection procedures with low complexity filters2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 19, p. 2500-2505Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.

    RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.

    CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.

    AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

  • 39.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting protein function from domain content2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 15, p. 1681-1687Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 40.
    Freyhult, Eva
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Moulton, Vincent
    Clote, Peter
    Boltzmann probability of RNA structural neighbors and riboswitch detection2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 16, p. 2054-2062Article in journal (Refereed)
    Abstract [en]

    Motivation: We describe algorithms implemented in a new software package, RNAbor, to investigate structures in a neighborhood of an input secondary structure of an RNA sequence s. The input structure could be the minimum free energy structure, the secondary structure obtained by analysis of the X-ray structure or by comparative sequence analysis, or an arbitrary intermediate structure.

    Results: A secondary structure of s is called a -neighbor of if and differ by exactly base pairs. RNAbor computes the number (N), the Boltzmann partition function (Z) and the minimum free energy (MFE) and corresponding structure over the collection of all -neighbors of . This computation is done simultaneously for all m, in run time O (mn3) and memory O(mn2), where n is the sequence length. We apply RNAbor for the detection of possible RNA conformational switches, and compare RNAbor with the switch detection method paRNAss. We also provide examples of how RNAbor can at times improve the accuracy of secondary structure prediction.

  • 41. Garber, Manuel
    et al.
    Guttman, Mitchell
    Clamp, Michele
    Zody, Michael C.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Friedman, Nir
    Xie, Xiaohui
    Identifying novel constrained elements by exploiting biased substitution patterns2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 12, p. I54-I62Article in journal (Refereed)
    Abstract [en]

    Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection.

  • 42. Garg, Shilpa
    et al.
    Martin, Marcel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Marschall, Tobias
    Read-based phasing of related individuals2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 12, p. 234-242Article in journal (Refereed)
    Abstract [en]

    Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually. Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2 x for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15 x coverage per individual.

  • 43.
    Gennemark, Peter
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics.
    Wedelin, Dag
    Benchmarks for identification of ordinary differential equations from time series data2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 6, p. 780-786Article in journal (Refereed)
    Abstract [en]

    Motivation: In recent years, the biological literature has seen a significant increase of reported methods for identifying both structure and parameters of ordinary differential equations (ODEs) from time series data. A natural way to evaluate the performance of such methods is to try them on a sufficient number of realistic test cases. However, weak practices in specifying identification problems and lack of commonly accepted benchmark problems makes it difficult to evaluate and compare different methods. Results: To enable better evaluation and comparisons between different methods, we propose how to specify identification problems as optimization problems with a model space of allowed reactions (e. g. reaction kinetics like Michaelis-Menten or S-systems), ranges for the parameters, time series data and an error function. We also de. ne a. le format for such problems. We then present a collection of more than 40 benchmark problems for ODE model identification of cellular systems. The collection includes realistic problems of different levels of difficulty w.r.t. size and quality of data. We consider both problems with simulated data from known systems, and problems with real data. Finally, we present results based on our identification algorithm for all benchmark problems. In comparison with publications on which we have based some of the benchmark problems, our approach allows all problems to be solved without the use of supercomputing.

  • 44. Ghahremanpour, Mohammad Mehdi
    et al.
    Arab, Seyed Shahriar
    Aghazadeh, Saman Biook
    Zhang, Jin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    van der Spoel, David
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    MemBuilder: a web-based graphical interface to build heterogeneously mixed membrane bilayers for the GROMACS biomolecular simulation program2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 3, p. 439-441Article in journal (Refereed)
    Abstract [en]

    Motivation: Molecular dynamics (MD) simulations have had a profound impact on studies of membrane proteins during past two decades, but the accuracy of MD simulations of membranes is limited by the quality of membrane models and the applied force fields. Membrane models used in MD simulations mostly contain one kind of lipid molecule. This is far from reality, for biological membranes always contain more than one kind of lipid molecule. Moreover, the lipid composition and their distribution are functionally important. As a result, there is a necessity to prepare more realistic lipid membranes containing different types of lipids at physiological concentrations. Results: To automate and simplify the building process of heterogeneous lipid bilayers as well as providing molecular topologies for included lipids based on both united and all-atom force fields, we provided MemBuilder as a web-based graphical user interface.

  • 45.
    Gopalacharyulu, Peddinti V.
    et al.
    VTT Biotechnology, Espoo, Finland.
    Lindfors, Erno
    VTT Biotechnology, Espoo, Finland.
    Bounsaythip, Catherine
    VTT Biotechnology, Espoo, Finland.
    Kivioja, Teemu
    VTT Biotechnology, Espoo, Finland.
    Yetukuri, Laxman
    VTT Biotechnology, Espoo, Finland.
    Hollmén, Jaakko
    Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.
    Oresic, Matej
    VTT Biotechnology, Espoo, Finland.
    Data integration and visualization system for enabling conceptual biology2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21 Suppl 1, p. i177-i185Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Integration of heterogeneous data in life sciences is a growing and recognized challenge. The problem is not only to enable the study of such data within the context of a biological question but also more fundamentally, how to represent the available knowledge and make it accessible for mining.

    RESULTS: Our integration approach is based on the premise that relationships between biological entities can be represented as a complex network. The context dependency is achieved by a judicious use of distance measures on these networks. The biological entities and the distances between them are mapped for the purpose of visualization into the lower dimensional space using the Sammon's mapping. The system implementation is based on a multi-tier architecture using a native XML database and a software tool for querying and visualizing complex biological networks. The functionality of our system is demonstrated with two examples: (1) A multiple pathway retrieval, in which, given a pathway name, the system finds all the relationships related to the query by checking available metabolic pathway, transcriptional, signaling, protein-protein interaction and ontology annotation resources and (2) A protein neighborhood search, in which given a protein name, the system finds all its connected entities within a specified depth. These two examples show that our system is able to conceptually traverse different databases to produce testable hypotheses and lead towards answers to complex biological questions.

  • 46. Grabherr, Manfred G
    et al.
    Russell, Pamela
    Meyer, Miriah
    Mauceli, Evan
    Alföldi, Jessica
    Di Palma, Federica
    Lindblad-Toh, Kerstin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Genome-wide synteny through highly sensitive sequence alignment: Satsuma2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 9, p. 1145-1151Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). RESULTS: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous 'battleship'-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. AVAILABILITY: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/.

  • 47.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2689-2690Article in journal (Refereed)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 48.
    Guy, Lionel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    phyloSkeleton: taxon selection, data retrieval and marker identification for phylogenomics2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 8, p. 1230-1232Article in journal (Refereed)
    Abstract [en]

    With the wealth of available genome sequences, a difficult and tedious part of inferring phylogenomic trees is now to select genomes with an appropriate taxon density in the different parts of the tree. The package described here offers tools to easily select the most representative organisms, following a set of simple rules based on taxonomy and assembly quality, to retrieve the genomes from public databases (NCBI, JGI), to annotate them if necessary, to identify given markers in these, and to prepare files for multiple sequence alignment.

    AVAILABILITY AND IMPLEMENTATION: phyloSkeleton is a Perl module and is freely available under GPLv3 at https://bitbucket.org/lionelguy/phyloskeleton/ CONTACT: lionel.guy@imbim.uu.se.

  • 49.
    Guy, Lionel
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organism Biology, Molecular Evolution.
    Roat Kultima, Jens
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organism Biology, Molecular Evolution.
    Andersson, Siv G.E.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organism Biology, Molecular Evolution.
    genoPlotR: comparative gene and genome visualization in R2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 18, p. 2334-2335Article in journal (Refereed)
    Abstract [en]

    The amount of gene and genome data obtained by next-generation sequencing technologies generates a need for comparative visualization tools. Complementing existing software for comparison and exploration of genomics data, genoPlotR automatically creates publication-grade linear maps of gene and genomes, in a highly automatic, flexible and reproducible way.

    Availability: genoPlotR is a platform-independent R package, available with full source code under a GPL2 license at R-Forge: http://genoplotr.r-forge.r-project.org/

    Contact: lionel.guy@ebc.uu.se

  • 50.
    Haider, Christian
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Kavic, Marina
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    TreeDom: a graphical web tool for analysing domain architecture evolution2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 15, p. 2384-2385Article in journal (Refereed)
    Abstract [en]

    We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence.

1234 1 - 50 of 175
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf