Change search
Refine search result
123 51 - 100 of 123
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 51.
    Johansson, Lennart F.
    et al.
    University Medical Center Groningen, The Netherlands.
    de Weerd, Hendrik A.
    University of Skövde, School of Bioscience. University of Skövde, The Systems Biology Research Centre. University Medical Center Groningen, The Netherlands.
    de Boer, Eddy N.
    University Medical Center Groningen, The Netherlands.
    van Dijk, Freerk
    University Medical Center Groningen, The Netherlands.
    Te Meerman, Gerard J.
    University Medical Center Groningen, The Netherlands.
    Sijmons, Rolf H.
    University Medical Center Groningen, The Netherlands.
    Sikkema-Raddatz, Birgit
    University Medical Center Groningen, The Netherlands.
    Swertz, Morris A.
    University Medical Center Groningen, The Netherlands.
    NIPTeR: an R package for fast and accurate trisomy prediction in non-invasive prenatal testing2018In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, no 1, article id 531Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Various algorithms have been developed to predict fetal trisomies using cell-free DNA in non-invasive prenatal testing (NIPT). As basis for prediction, a control group of non-trisomy samples is needed. Prediction accuracy is dependent on the characteristics of this group and can be improved by reducing variability between samples and by ensuring the control group is representative for the sample analyzed.

    RESULTS: NIPTeR is an open-source R Package that enables fast NIPT analysis and simple but flexible workflow creation, including variation reduction, trisomy prediction algorithms and quality control. This broad range of functions allows users to account for variability in NIPT data, calculate control group statistics and predict the presence of trisomies.

    CONCLUSION: NIPTeR supports laboratories processing next-generation sequencing data for NIPT in assessing data quality and determining whether a fetal trisomy is present. NIPTeR is available under the GNU LGPL v3 license and can be freely downloaded from https://github.com/molgenis/NIPTeR or CRAN.

  • 52.
    Kashani, Zahra RM
    et al.
    Institute of Biochemistry and Biophysics, University of Tehran.
    Ahrabian, Hayedeh
    School of Mathematics and Computer Science, University of Tehran.
    Elahi, Elahe
    School of Biology, University of Tehran.
    Nowzari-Dalini, Abbas
    School of Mathematics and Computer Science, University of Tehran.
    Ansari, Elnaz S
    School of Mathematics and Computer Science, University of Tehran.
    Asadi, Sahar
    Örebro University, School of Science and Technology.
    Mohammadi, Shahin
    School of Mathematics and Computer Science, University of Tehran.
    Schreiber, Falk
    Institute for Computer Science, Martin-Luther-University Halle-Wittenberg.
    Masoudi-Nejad, Ali
    Institute of Biochemistry and Biophysics, University of Tehran.
    Kavosh: a new algorithm for finding network motifs2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, no 318Article in journal (Refereed)
    Abstract [en]

    Background

    Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks. Existing algorithms for finding network motifs are extremely costly in CPU time and memory consumption and have practically restrictions on the size of motifs.

    Results

    We present a new algorithm (Kavosh), for finding k-size network motifs with less memory and CPU time in comparison to other existing algorithms. Our algorithm is based on counting all k-size sub-graphs of a given graph (directed or undirected). We evaluated our algorithm on biological networks of E. coli and S. cereviciae, and also on non-biological networks: a social and an electronic network.

    Conclusion

    The efficiency of our algorithm is demonstrated by comparing the obtained results with three well-known motif finding tools. For comparison, the CPU time, memory usage and the similarities of obtained motifs are considered. Besides, Kavosh can be employed for finding motifs of size greater than eight, while most of the other algorithms have restriction on motifs with size greater than eight. The Kavosh source code and help files are freely available at: http://Lbb.ut.ac.ir/Download/LBBsoft/Kavosh/.

  • 53.
    Katajamaa, Mikko
    et al.
    1Turku Centre for Biotechnology, Turku, Finland.
    Oresic, Matej
    2VTT Biotechnology, Espoo, Finland.
    Processing methods for differential analysis of LC/MS profile data2005In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6, article id 179Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Liquid chromatography coupled to mass spectrometry (LC/MS) has been widely used in proteomics and metabolomics research. In this context, the technology has been increasingly used for differential profiling, i.e. broad screening of biomolecular components across multiple samples in order to elucidate the observed phenotypes and discover biomarkers. One of the major challenges in this domain remains development of better solutions for processing of LC/MS data.

    RESULTS: We present a software package MZmine that enables differential LC/MS analysis of metabolomics data. This software is a toolbox containing methods for all data processing stages preceding differential analysis: spectral filtering, peak detection, alignment and normalization. Specifically, we developed and implemented a new recursive peak search algorithm and a secondary peak picking method for improving already aligned results, as well as a normalization tool that uses multiple internal standards. Visualization tools enable comparative viewing of data across multiple samples. Peak lists can be exported into other data analysis programs. The toolbox has already been utilized in a wide range of applications. We demonstrate its utility on an example of metabolic profiling of Catharanthus roseus cell cultures.

    CONCLUSION: The software is freely available under the GNU General Public License and it can be obtained from the project web page at: http://mzmine.sourceforge.net/.

  • 54.
    Kavakiotis, Ioannis
    et al.
    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece..
    Xochelli, Aliki
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology. CERTH, Inst Appl Biosci, Thessaloniki, Greece..
    Agathangelidis, Andreas
    Ist Sci San Raffaele, Div Mol Oncol, Milan, Italy.;Ist Sci San Raffaele, Dept Oncohematol, Milan, Italy..
    Tsoumakas, Grigorios
    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece..
    Maglaveras, Nicos
    CERTH, Inst Appl Biosci, Thessaloniki, Greece.;Aristotle Univ Thessaloniki, Sch Med, Lab Comp & Med Informat, Thessaloniki, Greece..
    Stamatopoulos, Kostas
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology. CERTH, Inst Appl Biosci, Thessaloniki, Greece..
    Hadzidimitriou, Anastasia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology. CERTH, Inst Appl Biosci, Thessaloniki, Greece..
    Vlahavas, Ioannis
    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece..
    Chouvarda, Ioanna
    CERTH, Inst Appl Biosci, Thessaloniki, Greece.;Aristotle Univ Thessaloniki, Sch Med, Lab Comp & Med Informat, Thessaloniki, Greece..
    Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia2016In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, article id 173Article in journal (Refereed)
    Abstract [en]

    Background: Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. Results: The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. Conclusions: This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  • 55. Khan, Mehmood Alam
    et al.
    Elias, Isaac
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nylander, Kristina
    Guimera, Roman Valls
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schobesberger, Richard
    Schmitzberger, Peter
    Lagergren, Jens
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Fastphylo: Fast tools for phylogenetics2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, p. 334-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Distance methods are ubiquitous tools in phylogenetics.Their primary purpose may be to reconstructevolutionary history, but they are also used as components in bioinformatic pipelines. However, poorcomputational efficiency has been a constraint on the applicability of distance methods on very largeproblem instances.

    RESULTS: We present fastphylo, a software package containing implementations of efficient algorithms for twocommon problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing aphylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methodsand report the results in terms of speed and memory efficiency.

    CONCLUSIONS: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture,fastphylo is a flexible tool for many phylogenetic studies.

  • 56.
    Khan, Mehmood Alam
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Elias, Isaac
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sjölund, Erik
    Stockholms universitet.
    Nylander, Kristina
    KTH, School of Computer Science and Communication (CSC).
    Guimera, Roman Valls
    Stockholms univetsitet.
    Schobesberger, Richard
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. University of Applied Sciences Upper Austria.
    Schmitzberger, Peter
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. University of Applied Sciences Upper Austria.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    fastphylo: Fast tools for phylogenetics2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, no 1, p. 334-Article in journal (Refereed)
    Abstract [en]

    Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

  • 57.
    Khan, Mehmood Alam
    et al.
    KTH, School of Computer Science and Communication (CSC). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Mahmudi, Owais
    KTH, School of Computer Science and Communication (CSC). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Ulah, Ikram
    KTH, School of Computer Science and Communication (CSC). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre. Stockholm Univ, Sweden.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Probabilistic inference of lateral gene transfer events2016In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, article id 431Article in journal (Refereed)
    Abstract [en]

    Background: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge. Results: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify "highways" of LGT. Conclusions: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.

  • 58. Khan, Mehmood Alam
    et al.
    Mahmudi, Owais
    Ullah, Ikram
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    Lagergren, Jens
    Probabilistic inference of lateral gene transfer events2016In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, no Suppl 14, article id 431Article in journal (Refereed)
    Abstract [en]

    Background: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge.

    Results: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify highways of LGT.

    Conclusions: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.

  • 59.
    Klammer, Martin
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    MetaTM - a consensus method for transmembrane protein topology prediction2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 314-Article in journal (Refereed)
    Abstract [en]

    Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. RESULTS: A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. CONCLUSION: Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at http://MetaTM.sbc.su.se.

  • 60. Kokocinski, Felix
    et al.
    Delhomme, Nicolas
    Wrobel, Gunnar
    Hummerich, Lars
    Toedt, Grischa
    Lichter, Peter
    FACT--a framework for the functional interpretation of high-throughput experiments.2005In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Interpreting the results of high-throughput experiments, such as those obtained from DNA-microarrays, is an often time-consuming task due to the high number of data-points that need to be analyzed in parallel. It is usually a matter of extensive testing and unknown beforehand, which of the possible approaches for the functional analysis will be the most informative.

    RESULTS: To address this problem, we have developed the Flexible Annotation and Correlation Tool (FACT). FACT allows for detection of important patterns in large data sets by simplifying the integration of heterogeneous data sources and the subsequent application of different algorithms for statistical evaluation or visualization of the annotated data. The system is constantly extended to include additional annotation data and comparison methods.

    CONCLUSION: FACT serves as a highly flexible framework for the explorative analysis of large genomic and proteomic result sets. The program can be used online; open source code and supplementary information are available at http://www.factweb.de.

  • 61.
    Kruczyk, Marcin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Umer, Husen Muhammad
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Genomics.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, p. 280-Article in journal (Refereed)
    Abstract [en]

    Background: Finding peaks in ChIP-seq is an important process in biological inference. In some cases, such as positioning nucleosomes with specific histone modifications or finding transcription factor binding specificities, the precision of the detected peak plays a significant role. There are several applications for finding peaks (called peak finders) based on different algorithms (e.g. MACS, Erange and HPeak). Benchmark studies have shown that the existing peak finders identify different peaks for the same dataset and it is not known which one is the most accurate. We present the first meta-server called Peak Finder MetaServer (PFMS) that collects results from several peak finders and produces consensus peaks. Our application accepts three standard ChIP-seq data formats: BED, BAM, and SAM. Results: Sensitivity and specificity of seven widely used peak finders were examined. For the experiments we used three previously studied Transcription Factors (TF) ChIP-seq datasets and identified three of the selected peak finders that returned results with high specificity and very good sensitivity compared to the remaining four. We also ran PFMS using the three selected peak finders on the same TF datasets and achieved higher specificity and sensitivity than the peak finders individually. Conclusions: We show that combining outputs from up to seven peak finders yields better results than individual peak finders. In addition, three of the seven peak finders outperform the remaining four, and running PFMS with these three returns even more accurate results. Another added value of PFMS is a separate report of the peaks returned by each of the included peak finders.

  • 62.
    Kuhn, Thomas
    et al.
    Fachhochschule Gelsenkirchen.
    Willighagen, Egon
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Zielesny, Achim
    Fachhochschule Gelsenkirchen.
    Steinbeck, Christoph
    European Bioinformatics Institute, Cambridge, UK.
    CDK-Taverna: an open workflow environment for cheminformatics2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 159-Article in journal (Refereed)
    Abstract [en]

    Background

    Small molecules are of increasing interest for bioinformatics in areas such as metabolomics and drug discovery. The recent release of large open access chemistry databases generates a demand for flexible tools to process them and discover new knowledge. To freely support open science based on these data resources, it is desirable for the processing tools to be open-source and available for everyone.

    Results

    Here we describe a novel combination of the workflow engine Taverna and the cheminformatics library Chemistry Development Kit (CDK) resulting in a open source workflow solution for cheminformatics. We have implemented more than 160 different workers to handle specific cheminformatics tasks. We describe the applications of CDK-Taverna in various usage scenarios.

    Conclusions

    The combination of the workflow engine Taverna and the Chemistry Development Kit provides the first open source cheminformatics workflow solution for the biosciences. With the Taverna-community working towards a more powerful workflow engine and a more user-friendly user interface, CDK-Taverna has the potential to become a free alternative to existing proprietary workflow tools.

  • 63.
    Kultima, Kim
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Scholz, Birger
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alm, Henrik
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Sköld, Karl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Svensson, Marcus
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Crossman, Alan
    Bezard, Erwan
    Andrén, Per E.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lönnstedt, Ingrid
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics.
    Normalization and expression changes in predefined sets of proteins using 2D gel electrophoresis: A proteomic study of L-DOPA induced dyskinesia in an animal model of Parkinson's disease using DIGE2006In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, p. 475-Article in journal (Refereed)
    Abstract [en]

    Background: Two-Dimensional Difference In Gel Electrophoresis (2D-DIGE) is a powerful tool for measuring differences in protein expression between samples or conditions. However, to remove systematic variability within and between gels the data has to be normalized.

    In this study we examined the ability of four existing and four novel normalization methods to remove systematic bias in data produced with 2D-DIGE. We also propose a modification of an existing method where the statistical framework determines whether a set of proteins shows an association with the predefined phenotypes of interest. This method was applied to our data generated from a monkey model (Macaca fascicularis) of Parkinson's disease.

    Results: Using 2D-DIGE we analysed the protein content of the striatum from 6 control and 21 MPTP-treated monkeys, with or without de novo or long-term L-DOPA administration.

    There was an intensity and spatial bias in the data of all the gels examined in this study. Only two of the eight normalization methods evaluated ('2D loess+scale' and 'SC-2D+quantile') successfully removed both the intensity and spatial bias. In 'SC-2D+quantile' we extended the commonly used loess normalization method against dye bias in two-channel microarray systems to suit systems with three or more channels. Further, by using the proposed method, Differential Expression in Predefined Proteins Sets (DEPPS), several sets of proteins associated with the priming effects of L-DOPA in the striatum in parkinsonian animals were identified. Three of these sets are proteins involved in energy metabolism and one set involved proteins which are part of the microtubule cytoskeleton.

    Conclusion: Comparison of the different methods leads to a series of methodological recommendations for the normalization and the analysis of data, depending on the experimental design. Due to the nature of 2D-DIGE data we recommend that the p-values obtained in significance tests should be used as rankings only. Individual proteins may be interesting as such, but by studying sets of proteins the interpretation of the results are probably more accurate and biologically informative.

  • 64.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Prusis, Peteris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Proteochemometric modeling of HIV protease susceptibility2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, p. 181-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND

    A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations.

    RESULTS

    The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72.

    CONCLUSION

    Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.

  • 65.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 339-Article in journal (Refereed)
    Abstract [en]

    Background

    Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity.

    Results

    We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (K-d). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least-squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P-2 = 0.67-0.73; for new kinases it ranged P-kin(2) = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P-2 = 0.47, P-kin(2) = 0.42 and AUC = 0.83.

    Conclusions

    Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.

  • 66.
    Lysholm, Fredrik
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, The Institute of Technology.
    Highly improved homopolymer aware nucleotide-protein alignments with 454 data2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, no 230Article in journal (Refereed)
    Abstract [en]

    Background

    Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis.

    Results

    To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data.

    Conclusions

    This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat.

  • 67.
    Lysholm, Fredrik
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics . Linköping University, The Institute of Technology.
    Andersson, Bjorn
    Karolinska Institute.
    Persson, Bengt
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics . Linköping University, The Institute of Technology.
    FAAST: Flow-space Assisted Alignment Search Tool2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, no 293Article in journal (Refereed)
    Abstract [en]

    Background: High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. less thanbrgreater than less thanbrgreater thanResults: We suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool). less thanbrgreater than less thanbrgreater thanConclusions: We present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed. The tool is available at http://www.ifm.liu.se/bioinfo/

  • 68. Mahmudi, Owais
    et al.
    Sjöstrand, Joel
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sennblad, Bengt
    Lagergren, Jens
    Genome-wide probabilistic reconciliation analysis across vertebrates2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, no Suppl 15, p. S10-Article in journal (Refereed)
    Abstract [en]

    Gene duplication is considered to be a major driving force in evolution that enables the genome of a species to acquire new functions. A reconciliation - a mapping of gene tree vertices to the edges or vertices of a species tree - explains where gene duplications have occurred on the species tree. In this study, we sample reconciliations from a posterior over reconciliations, gene trees, edge lengths and other parameters, given a species tree and gene sequences. We employ a Bayesian analysis tool, based on the probabilistic model DLRS that integrates gene duplication, gene loss and sequence evolution under a relaxed molecular clock for substitution rates, to obtain this posterior.

    By applying these methods, we perform a genome-wide analysis of a nine species dataset, OPTIC, and conclude that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history. For the given dataset, we observe that approximately 19% of the sampled reconciliations are different from MPR. This is in clear contrast with previous estimates, based on simpler models and less realistic assumptions, according to which 98% of the reconciliations can be expected to be identical to MPR. We also generate heatmaps showing where in the species trees duplications have been most frequent during the evolution of these species.

  • 69.
    Mahmudi, Owais
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sjöstrand, Joel
    Sennblad, Bengt
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Genome-wide probabilistic reconciliation analysis across vertebrates2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, p. S10-Article in journal (Refereed)
    Abstract [en]

    Gene duplication is considered to be a major driving force in evolution that enables the genome of a species to acquire new functions. A reconciliation - a mapping of gene tree vertices to the edges or vertices of a species tree explains where gene duplications have occurred on the species tree. In this study, we sample reconciliations from a posterior over reconciliations, gene trees, edge lengths and other parameters, given a species tree and gene sequences. We employ a Bayesian analysis tool, based on the probabilistic model DLRS that integrates gene duplication, gene loss and sequence evolution under a relaxed molecular clock for substitution rates, to obtain this posterior. By applying these methods, we perform a genome-wide analysis of a nine species dataset, OPTIC, and conclude that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history. For the given dataset, we observe that approximately 19% of the sampled reconciliations are different from MPR. This is in clear contrast with previous estimates, based on simpler models and less realistic assumptions, according to which 98% of the reconciliations can be expected to be identical to MPR. We also generate heatmaps showing where in the species trees duplications have been most frequent during the evolution of these species.

  • 70.
    Malm, Erik
    et al.
    KTH, School of Biotechnology (BIO), Glycoscience.
    Srivastava, Vaibhav
    KTH, School of Biotechnology (BIO), Glycoscience.
    Sundqvist, Gustav
    KTH, School of Biotechnology (BIO), Glycoscience.
    Bulone, Vincent
    KTH, School of Biotechnology (BIO), Glycoscience.
    APP: An Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, no 1, article id 345Article in journal (Refereed)
    Abstract [en]

    Background: Mass spectrometry analyses of complex protein samples yield large amounts of data and specific expertise is needed for data analysis, in addition to a dedicated computer infrastructure. Furthermore, the identification of proteins and their specific properties require the use of multiple independent bioinformatics tools and several database search algorithms to process the same datasets. In order to facilitate and increase the speed of data analysis, there is a need for an integrated platform that would allow a comprehensive profiling of thousands of peptides and proteins in a single process through the simultaneous exploitation of multiple complementary algorithms. Results: We have established a new proteomics pipeline designated as APP that fulfills these objectives using a complete series of tools freely available from open sources. APP automates the processing of proteomics tasks such as peptide identification, validation and quantitation from LC-MS/MS data and allows easy integration of many separate proteomics tools. Distributed processing is at the core of APP, allowing the processing of very large datasets using any combination of Windows/Linux physical or virtual computing resources. Conclusions: APP provides distributed computing nodes that are simple to set up, greatly relieving the need for separate IT competence when handling large datasets. The modular nature of APP allows complex workflows to be managed and distributed, speeding up throughput and setup. Additionally, APP logs execution information on all executed tasks and generated results, simplifying information management and validation.

  • 71. Masseroli, Marco
    et al.
    Mons, Barend
    Bongcam-Rudloff, Erik
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
    Ceri, Stefano
    Kel, Alexander
    Rechenmann, Francois
    Lisacek, Frederique
    Romano, Paolo
    Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, no S1, p. S2-Article, review/survey (Refereed)
    Abstract [en]

    Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.

  • 72.
    Merid, Simon Kebede
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Goranskaya, Daria
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, p. 308-Article in journal (Refereed)
    Abstract [en]

    Background: In somatic cancer genomes, delineating genuine driver mutations against a background of multiple passenger events is a challenging task. The difficulty of determining function from sequence data and the low frequency of mutations are increasingly hindering the search for novel, less common cancer drivers. The accumulation of extensive amounts of data on somatic point and copy number alterations necessitates the development of systematic methods for driver mutation analysis. Results: We introduce a framework for detecting driver mutations via functional network analysis, which is applied to individual genomes and does not require pooling multiple samples. It probabilistically evaluates 1) functional network links between different mutations in the same genome and 2) links between individual mutations and known cancer pathways. In addition, it can employ correlations of mutation patterns in pairs of genes. The method was used to analyze genomic alterations in two TCGA datasets, one for glioblastoma multiforme and another for ovarian carcinoma, which were generated using different approaches to mutation profiling. The proportions of drivers among the reported de novo point mutations in these cancers were estimated to be 57.8% and 16.8%, respectively. The both sets also included extended chromosomal regions with synchronous duplications or losses of multiple genes. We identified putative copy number driver events within many such segments. Finally, we summarized seemingly disparate mutations and discovered a functional network of collagen modifications in the glioblastoma. In order to select the most efficient network for use with this method, we used a novel, ROC curve-based procedure for benchmarking different network versions by their ability to recover pathway membership. Conclusions: The results of our network-based procedure were in good agreement with published gold standard sets of cancer genes and were shown to complement and expand frequency-based driver analyses. On the other hand, three sequence-based methods applied to the same data yielded poor agreement with each other and with our results. We review the difference in driver proportions discovered by different sequencing approaches and discuss the functional roles of novel driver mutations. The software used in this work and the global network of functional couplings are publicly available at http://research.scilifelab.se/andrej_alexeyenko/downloads.html.

  • 73. Moeller, Steffen
    et al.
    Afgan, Enis
    Banck, Michael
    Bonnal, Raoul J. P.
    Booth, Timothy
    Chilton, John
    Cock, Peter J. A.
    Gumbel, Markus
    Harris, Nomi
    Holland, Richard
    Kalas, Matus
    Kajan, Laszlo
    Kibukawa, Eri
    Powel, David R.
    Prins, Pjotr
    Quinn, Jacqueline
    Sallou, Olivier
    Strozzi, Francesco
    Seemann, Torsten
    Sloggett, Clare
    Soiland-Reyes, Stian
    Spooner, William
    Steinbiss, Sascha
    Tille, Andreas
    Travis, Anthony J.
    Valls Guimera, Roman
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Katayama, Toshiaki
    Chapman, Brad A.
    Community-driven development for computational biology at Sprints, Hackathons and Codefests2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, p. S7-Article in journal (Refereed)
    Abstract [en]

    Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled unconferences (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.

  • 74. Nadalin, Francesca
    et al.
    Vezzi, Francesco
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Policriti, Alberto
    GapFiller: a de novo assembly approach to fill the gap within paired reads2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, p. S8-Article in journal (Refereed)
    Abstract [en]

    Background: Next Generation Sequencing technologies are able to provide high genome coverages at a relatively low cost. However, due to limited reads' length (from 30 bp up to 200 bp), specific bioinformatics problems have become even more difficult to solve. De novo assembly with short reads, for example, is more complicated at least for two reasons: first, the overall amount of "noisy" data to cope with increased and, second, as the reads' length decreases the number of unsolvable repeats grows. Our work's aim is to go at the root of the problem by providing a pre-processing tool capable to produce (in-silico) longer and highly accurate sequences from a collection of Next Generation Sequencing reads. Results: In this paper a seed-and-extend local assembler is presented. The kernel algorithm is a loop that, starting from a read used as seed, keeps extending it using heuristics whose main goal is to produce a collection of error-free and longer sequences. In particular, GapFiller carefully detects reliable overlaps and operates clustering similar reads in order to reconstruct the missing part between the two ends of the same insert. Our tool's output has been validated on 24 experiments using both simulated and real paired reads datasets. The output sequences are declared correct when the seed-mate is found. In the experiments performed, GapFiller was able to extend high percentages of the processed seeds and find their mates, with a false positives rate that turned out to be nearly negligible. Conclusions: GapFiller, starting from a sufficiently high short reads coverage, is able to produce high coverages of accurate longer sequences (from 300 bp up to 3500 bp). The procedure to perform safe extensions, together with the mate-found check, turned out to be a powerful criterion to guarantee contigs' correctness. GapFiller has further potential, as it could be applied in a number of different scenarios, including the post-processing validation of insertions/deletions detection pipelines, pre-processing routines on datasets for de novo assembly pipelines, or in any hierarchical approach designed to assemble, analyse or validate pools of sequences.

  • 75.
    Nilsson, Roland
    et al.
    Linköping University, The Institute of Technology. Linköping University, Department of Physics, Chemistry and Biology, Computational Biology.
    Peña, Jose M.
    Linköping University, Department of Computer and Information Science, Database and information techniques.
    Björkegren, Johan
    Computional Medicin group KI.
    Tegnér, Jesper
    Linköping University, The Institute of Technology. Linköping University, Department of Physics, Chemistry and Biology, Computational Biology.
    Detecting Multivariate Differentially Expressed Genes2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8:150Article in journal (Refereed)
  • 76. Oja, Merja
    et al.
    Peltonen, Jaakko
    Blomberg, Jonas
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
    Kaski, Samuel
    Methods for estimating human endogenous retrovirus activities from EST databases2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, p. S11-Article in journal (Refereed)
    Abstract [en]

    Background: Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown. Results: We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown. Conclusion: (i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.

  • 77. Patrik, Rydén
    Evaluation of microarray data normalization procedures using spike-in experiments2006In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, no 1, p. 300-Article in journal (Refereed)
  • 78. Pemberton, Trevor J.
    et al.
    Sandefur, Conner I.
    Jakobsson, Mattias
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Evolutionary Biology.
    Rosenberg, Noah A.
    Sequence determinants of human microsatellite variability2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. e612-Article, review/survey (Refereed)
    Abstract [en]

    Background

    Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database.

    Results

    Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length), under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity.

    Conclusions

    These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  • 79.
    Persson, Emma
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Forslund, Sofia K.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Domainoid: domain-oriented orthology inference2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, no 1, article id 523Article in journal (Refereed)
    Abstract [en]

    Background: Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.

    Results: This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.

    Conclusions: Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.

  • 80.
    Pluskal, Tomás
    et al.
    G0 Cell Unit, Okinawa Institute of Science and Technology (OIST), Onna Okinawa, Japan.
    Castillo, Sandra
    Quantitative Biology and Bioinformatics, VTT Technical Research Centre of Finland, Espoo, Finland.
    Villar-Briones, Alejandro
    G0 Cell Unit, Okinawa Institute of Science and Technology (OIST), Onna Okinawa, Japan.
    Oresic, Matej
    Quantitative Biology and Bioinformatics, VTT Technical Research Centre of Finland, Espoo, Finland.
    MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, article id 395Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2.

    RESULTS: A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings. Newly introduced functionality includes the identification of peaks using online databases, MSn data support, improved isotope pattern support, scatter plot visualization, and a new method for peak list alignment based on the random sample consensus (RANSAC) algorithm. The performance of the RANSAC alignment was evaluated using synthetic datasets as well as actual experimental data, and the results were compared to those obtained using other alignment algorithms.

    CONCLUSIONS: MZmine 2 is freely available under a GNU GPL license and can be obtained from the project website at: http://mzmine.sourceforge.net/. The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses.

  • 81.
    Polychronidou, Eleftheria
    et al.
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Kalamaras, Ilias
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Agathangelidis, Andreas
    Ctr Res & Technol Hellas, Inst Appl Biosci, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Sutton, Lesley Ann
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology. Tech Univ Denmark, Dept Immunol, Copenhagen, Denmark.
    Yan, Xiao-Jie
    Feinstein Inst Med Res, Karches Ctr Chron Lymphocyt Leukemia Res, Manhasset, NY USA.
    Bikos, Vasilis
    Masaryk Univ, Cent European Inst Technol, Brno, Czech Republic.
    Vardi, Anna
    G Papanikolaou Hosp, Hematol Dept, Thessaloniki, Greece;G Papanikolaou Hosp, HCT Unit, Thessaloniki, Greece.
    Mochament, Konstantinos
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Chiorazzi, Nicholas
    Feinstein Inst Med Res, Karches Ctr Chron Lymphocyt Leukemia Res, Manhasset, NY USA.
    Belessi, Chrysoula
    Nikea Gen Hosp, Dept Hematol, Piraeus, Greece.
    Rosenquist, Richard
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology. Tech Univ Denmark, Dept Immunol, Copenhagen, Denmark.
    Ghia, Paolo
    IRCCS San Raffaele Sci Inst, Milan, Italy;Univ Milan, VitaSalute, San Raffaele, Div Expt Oncol, Milan, Italy.
    Stamatopoulos, Kostas
    Ctr Res & Technol Hellas, Inst Appl Biosci, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Vlamos, Panayiotis
    Ionian Univ, Dept Informat, Corfu, Greece.
    Chailyan, Anna
    Carlsberg Res Lab, Copenhagen, Denmark.
    Overby, Nanna
    Tech Univ Denmark, Ctr Biol Sequence Anal, Copenhagen, Denmark.
    Marcatili, Paolo
    Tech Univ Denmark, Ctr Biol Sequence Anal, Copenhagen, Denmark.
    Hatzidimitriou, Anastasia
    Ctr Res & Technol Hellas, Inst Appl Biosci, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Tzovaras, Dimitrios
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Automated shape-based clustering of 3D immunoglobulin protein structures in chronic lymphocytic leukemia2018In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, article id 414Article in journal (Refereed)
    Abstract [en]

    Background: Although the etiology of chronic lymphocytic leukemia (CLL), the most common type of adult leukemia, is still unclear, strong evidence implicates antigen involvement in disease ontogeny and evolution. Primary and 3D structure analysis has been utilised in order to discover indications of antigenic pressure. The latter has been mostly based on the 3D models of the clonotypic B cell receptor immunoglobulin (BcR IG) amino acid sequences. Therefore, their accuracy is directly dependent on the quality of the model construction algorithms and the specific methods used to compare the ensuing models. Thus far, reliable and robust methods that can group the IG 3D models based on their structural characteristics are missing. Results: Here we propose a novel method for clustering a set of proteins based on their 3D structure focusing on 3D structures of BcR IG from a large series of patients with CLL. The method combines techniques from the areas of bioinformatics, 3D object recognition and machine learning. The clustering procedure is based on the extraction of 3D descriptors, encoding various properties of the local and global geometrical structure of the proteins. The descriptors are extracted from aligned pairs of proteins. A combination of individual 3D descriptors is also used as an additional method. The comparison of the automatically generated clusters to manual annotation by experts shows an increased accuracy when using the 3D descriptors compared to plain bioinformatics-based comparison. The accuracy is increased even more when using the combination of 3D descriptors. Conclusions: The experimental results verify that the use of 3D descriptors commonly used for 3D object recognition can be effectively applied to distinguishing structural differences of proteins. The proposed approach can be applied to provide hints for the existence of structural groups in a large set of unannotated BcR IG protein files in both CLL and, by logical extension, other contexts where it is relevant to characterize BcR IG structural similarity. The method does not present any limitations in application and can be extended to other types of proteins.

  • 82.
    Prusis, Peteris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Uhlén, Staffan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Petrovska, Ramona
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Lapinsh, Maris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Prediction of indirect interactions in proteins2006In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, p. 167-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Both direct and indirect interactions determine molecular recognition of ligands by proteins. Indirect interactions can be defined as effects on recognition controlled from distant sites in the proteins, e.g. by changes in protein conformation and mobility, whereas direct interactions occur in close proximity of the protein's amino acids and the ligand. Molecular recognition is traditionally studied using three-dimensional methods, but with such techniques it is difficult to predict the effects caused by mutational changes of amino acids located far away from the ligand-binding site. We recently developed an approach, proteochemometrics, to the study of molecular recognition that models the chemical effects involved in the recognition of ligands by proteins using statistical sampling and mathematical modelling. RESULTS: A proteochemometric model was built, based on a statistically designed protein library's (melanocortin receptors') interaction with three peptides and used to predict which amino acids and sequence fragments that are involved in direct and indirect ligand interactions. The model predictions were confirmed by directed mutagenesis. The predicted presumed direct interactions were in good agreement with previous three-dimensional studies of ligand recognition. However, in addition the model could also correctly predict the location of indirect effects on ligand recognition arising from distant sites in the receptors, something that three-dimensional modelling could not afford. CONCLUSION: We demonstrate experimentally that proteochemometric modelling can be used with high accuracy to predict the site of origin of direct and indirect effects on ligand recognitions by proteins.

  • 83. Rantalainen, Mattias
    et al.
    Cloarec, Olivier
    Ebbels, Timothy
    Lundstedt, Torbjörn
    Nicholson, Jeremy
    Holmes, Elaine
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Piecewise multivariate modelling of sequential metabolic profiling data2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, article id 105Article in journal (Refereed)
    Abstract [en]

    Background: Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints.

    Results: A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted.

    Conclusion: The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data.

  • 84. Rantalainen, Mattias
    et al.
    Cloarec, Olivier
    Ebbels, Timothy M. D.
    Lundstedt, Torbjörn
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Medicinal Chemistry, Organic Pharmaceutical Chemistry.
    Nicholson, Jeremy K.
    Holmes, Elaine
    Trygg, Johan
    Piecewise multivariate modelling of sequential metabolic profiling data2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, p. 105-Article in journal (Refereed)
    Abstract [en]

    Background: Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints. Results: A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted. Conclusion: The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data.

  • 85.
    Ray, Arjun
    et al.
    KTH, School of Engineering Sciences (SCI), Theoretical Physics. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Wallner, B.
    Improved model quality assessment using ProQ22012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, no 1, p. 224-Article in journal (Refereed)
    Abstract [en]

    Background: Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement.Results: Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed to the use of profile weighting of the residue specific features and the use features averaged over the whole model even though the prediction is still local.Conclusions: ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson's correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8 and CASP9, respectively. ProQ2 is available at http://proq2.wallnerlab.org.

  • 86.
    Ray, Arjun
    et al.
    Department of Theoretical Physics & Swedish eScience Research Center, Royal Institute of Technology, Stockholm, Sweden.
    Lindahl, Erik
    Department of Theoretical Physics & Swedish eScience Research Center, Royal Institute of Technology, Stockholm, Sweden and Center for Biomembrane Research, Department of Biochemistry & Biophysics, Stockholm University, Stockholm, Sweden.
    Wallner, Björn
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, The Institute of Technology.
    Improved model quality assessment using ProQ22012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13Article in journal (Refereed)
    Abstract [en]

    Background

    Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement.

    Results

    Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed to the use of profile weighting of the residue specific features and the use features averaged over the whole model even though the prediction is still local.

    Conclusions

    ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson’s correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8 and CASP9, respectively. ProQ2 is available at http://proq2.wallnerlab.org.

  • 87.
    Repsilber, Dirk
    et al.
    Department of Genetics and Biometry, Research Institute for the Biology of Farm Animals, Dummerstorf, Germany.
    Kern, Sabine
    Bioinformatics Chair, Institute for Biochemistry and Biology at the University of Potsdam, Potsdam-Golm, Germany.
    Telaar, Anna
    Department of Genetics and Biometry, Research Institute for the Biology of Farm Animals, Dummerstorf, Germany .
    Walzl, Gerhard
    Molecular Biology and Human Genetics, University of Stellenbosch, Tygerberg, Cape Town, South Africa .
    Black, Gillian F
    Molecular Biology and Human Genetics, University of Stellenbosch, Tygerberg, Cape Town, South Africa .
    Selbig, Joachim
    Bioinformatics Chair, Institute for Biochemistry and Biology at the University of Potsdam, Potsdam-Golm, Germany .
    Parida, Shreemanta K
    Department of Immunology, Max-Planck-Institute for Infection Biology, Berlin, Germany.
    Kaufmann, Stefan H E
    Department of Immunology, Max-Planck-Institute for Infection Biology, Berlin, Germany.
    Jacobsen, Marc
    Department of Immunology, Max-Planck-Institute for Infection Biology, Berlin, Germany; Department of Immunology, Bernhard-Nocht-Institute for Tropical Medicine, Hamburg, Germany.
    Biomarker discovery in heterogeneous tissue samples: taking the in-silico deconfounding approach2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, article id 27Article in journal (Refereed)
    Abstract [en]

    Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.

    Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach.Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.

    Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.

  • 88.
    Rios, Javier
    et al.
    Department of Computer Architecture, Malaga University.
    Karlsson, Johan
    Department of Computer Architecture, Malaga University.
    Trelles, Oswaldo
    Department of Computer Architecture, Malaga University.
    Magallanes: a web services discovery and automatic workflow composition tool2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, no 334Article in journal (Refereed)
    Abstract [en]

    BACKGROUND:To aid in bioinformatics data processing and analysis, an increasing number of web-based applications are being deployed. Although this is a positive circumstance in general, the proliferation of tools makes it difficult to find the right tool, or more importantly, the right set of tools that can work together to solve real complex problems.

    RESULTS:Magallanes (Magellan) is a versatile, platform-independent Java library of algorithms aimed at discovering bioinformatics web services and associated data types. A second important feature of Magallanes is its ability to connect available and compatible web services into workflows that can process data sequentially to reach a desired output given a particular input. Magallanes’ capabilities can be exploited both as an API or directly accessed through a graphic user interface.The Magallanes’ API is freely available for academic use, and together with Magallanes application has been tested in MS-WindowsTM XP and Unix-like operating systems. Detailed implementation information, including user manuals and tutorials, is available at http://www.bitlab-es.com/magallanes.

    CONCLUSION:Different implementations of the same client (web page, desktop applications, web services, etc.) have been deployed and are currently in use in real installations such as the National Institute of Bioinformatics (Spain) and the ACGT-EU project. This shows the potential utility and versatility of the software library, including the integration of novel tools in the domain and with strong evidences in the line of facilitate the automatic discovering and composition of workflows.

  • 89.
    Rush, Stephen
    et al.
    Örebro University, School of Medical Sciences.
    Repsilber, Dirk
    Örebro University, School of Medical Sciences.
    Capturing context-specific regulation in molecular interaction networks2018In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, no 1, article id 539Article in journal (Refereed)
    Abstract [en]

    Background: Molecular profiles change in response to perturbations. These changes are coordinated into functional modules via regulatory interactions. The genes and their products within a functional module are expected to be differentially expressed in a manner coherent with their regulatory network. This perspective presents a promising approach to increase precision in detecting differential signals as well as for describing differential regulatory signals within the framework of a priori knowledge about the underlying network, and so from a mechanistic point of view.

    Results: We present Coherent Network Expression (CoNE), an effective procedure for identifying differentially activated functional modules in molecular interaction networks. Differential gene expression is chosen as example, and differential signals coherent with the regulatory nature of the network are identified. We apply our procedure to systematically simulated data, comparing its performance to alternative methods. We then take the example case of a transcription regulatory network in the context of particle-induced pulmonary inflammation, recapitulating and proposing additional candidates to previously obtained results. CoNE is conveniently implemented in an R-package along with simulation utilities.

    Conclusion: Combining coherent interactions with error control on differential gene expression results in uniformly greater specificity in inference than error control alone, ensuring that captured functional modules constitute real findings.

  • 90.
    Rydén, Patrik
    et al.
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology. Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
    Andersson, Henrik
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
    Landfors, Mattias
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
    Näslund, Linda
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
    Hartmanová, Blanka
    Noppa, Laila
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
    Sjöstedt, Anders
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
    Evaluation of microarray data normalization procedures using spike-in experiments2006In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, no 300, p. 17-Article in journal (Refereed)
    Abstract [en]

    Background: Recently, a large number of methods for the analysis of microarray data have been proposed but there are few comparisons of their relative performances. By using so-called spike-in experiments, it is possible to characterize the analyzed data and thereby enable comparisons of different analysis methods.

    Results: A spike-in experiment using eight in-house produced arrays was used to evaluate established and novel methods for filtration, background adjustment, scanning, channel adjustment, and censoring. The S-plus package EDMA, a stand-alone tool providing characterization of analyzed cDNA-microarray data obtained from spike-in experiments, was developed and used to evaluate 252 normalization methods. For all analyses, the sensitivities at low false positive rates were observed together with estimates of the overall bias and the standard deviation. In general, there was a trade-off between the ability of the analyses to identify differentially expressed genes (i.e. the analyses' sensitivities) and their ability to provide unbiased estimators of the desired ratios. Virtually all analysis underestimated the magnitude of the regulations; often less than 50% of the true regulations were observed. Moreover, the bias depended on the underlying mRNA-concentration; low concentration resulted in high bias. Many of the analyses had relatively low sensitivities, but analyses that used either the constrained model (i.e. a procedure that combines data from several scans) or partial filtration (a novel method for treating data from so-called not-found spots) had with few exceptions high sensitivities. These methods gave considerable higher sensitivities than some commonly used analysis methods.

    Conclusion: The use of spike-in experiments is a powerful approach for evaluating microarray preprocessing procedures. Analyzed data are characterized by properties of the observed log-ratios and the analysis' ability to detect differentially expressed genes. If bias is not a major problem; we recommend the use of either the CM-procedure or partial filtration.

     

  • 91. Rögnvaldsson, Thorsteinn
    et al.
    Etchells, A
    You, Liwen
    Garwicz, Daniel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
    Jarman, Ian
    Lisboa, Paulo J. G.
    How to find simple and accurate rules for viral protease cleavage specificities2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 149-Article in journal (Refereed)
    Abstract [en]

    Background: Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results: A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion: A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.

  • 92.
    Rögnvaldsson, Thorsteinn
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent systems (IS-lab).
    Etchells, Terence A
    School of Computing and Mathematical Sciences, Liverpool John Moores University.
    You, Liwen
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent systems (IS-lab).
    Garwicz, Daniel
    Department of Molecular Medicine and Surgery, Karolinska Institutet.
    Jarman, Ian
    School of Computing and Mathematical Sciences, Liverpool John Moores University.
    Lisboa, Paulo J G
    School of Computing and Mathematical Sciences, Liverpool John Moores University.
    How to find simple and accurate rules for viral protease cleavage specificities2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 149-156Article in journal (Refereed)
    Abstract [en]

    BACKGROUND:

    Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.

    RESULTS:

    A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.

    CONCLUSION:

    A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.

  • 93.
    Rögnvaldsson, Thorsteinn
    et al.
    Örebro University, School of Science and Technology.
    Etchells, Terence A.
    You, Liwen
    Garwicz, Daniel
    Jarman, Ian
    Lisboa, Paulo J. G.
    How to find simple and accurate rules for viral protease cleavage specificities2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 149-Article in journal (Refereed)
    Abstract [en]

    Background: Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results: A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion: A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.

  • 94. Sahlin, Kristoffer
    et al.
    Vezzi, Francesco
    Nystedt, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lundeberg, Joakim
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    BESST - Efficient scaffolding of large fragmented assemblies2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, article id 281Article in journal (Refereed)
    Abstract [en]

    Background

    The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.

    We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance. 

    Results

    We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide.

    Conclusion

    We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding. 

  • 95.
    Sahlin, Kristoffer
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Vezzi, Francesco
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Nystedt, Björn
    Lundeberg, Joakim
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    BESST - Efficient scaffolding of large fragmented assemblies2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, no 1, p. 281-Article in journal (Refereed)
    Abstract [en]

    Background: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features. We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software's general performance. Results: We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. Conclusion: We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding.

  • 96.
    Scheubert, Lena
    et al.
    Institute of Computer Science, University of Osnabrück, Osnabrück, Germany; Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany.
    Luštrek, Mitja
    Department of Intelligent Systems, Jozef Stefan Institute, Ljubljana, Slovenia.
    Schmidt, Rainer
    Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany.
    Repsilber, Dirk
    Leibniz Institute for Farm Animal Biology (FBN Dummersdorf), Dummerstorf, Germany.
    Fuellen, Georg
    Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany; German Center for Neroudegenerative Disorders (DZNE), Rostock, Germany.
    Tissue-based Alzheimer gene expression markers-comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, no 1, article id 266Article in journal (Refereed)
    Abstract [en]

    Background: Alzheimer's disease has been known for more than 100 years and the underlying molecular mechanisms are not yet completely understood. The identification of genes involved in the processes in Alzheimer affected brain is an important step towards such an understanding. Genes differentially expressed in diseased and healthy brains are promising candidates.

    Results: Based on microarray data we identify potential biomarkers as well as biomarker combinations using three feature selection methods: information gain, mean decrease accuracy of random forest and a wrapper of genetic algorithm and support vector machine (GA/SVM). Information gain and random forest are two commonly used methods. We compare their output to the results obtained from GA/SVM. GA/SVM is rarely used for the analysis of microarray data, but it is able to identify genes capable of classifying tissues into different classes at least as well as the two reference methods.

    Conclusion: Compared to the other methods, GA/SVM has the advantage of finding small, less redundant sets of genes that, in combination, show superior classification characteristics. The biological significance of the genes and gene pairs is discussed.

  • 97. Sennblad, Bengt
    et al.
    Schreil, Eva
    Berglund Sonnhammer, Ann-Charlotte
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC).
    Arvestad, Lars
    KTH, School of Computer Science and Communication (CSC).
    primetv: a viewer for reconciled trees2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8Article in journal (Refereed)
    Abstract [en]

    Background: Evolutionary processes, such as gene family evolution or parasite-host cospeciation, can often be viewed as a tree evolving inside another tree. Relating two given trees under such a constraint is known as reconciling them. Adequate software tools for generating illustrations of tree reconciliations are instrumental for presenting and communicating results and ideas regarding these phenomena. Available visualization tools have been limited to illustrations of the most parsimonious reconciliation. However, there exists a plethora of biologically relevant non-parsimonious reconciliations. Illustrations of these general reconciliations may not be achieved without manual editing. Results: We have developed a new reconciliation viewer, primetv. It is a simple and compact visualization program that is the first automatic tool for illustrating general tree reconciliations. It reads reconciled trees in an extended Newick format and outputs them as tree-within-tree illustrations in a range of graphic formats. Output attributes, such as colors and layout, can easily be adjusted by the user. To enhance the construction of input to primetv, two helper programs, readReconciliation and reconcile, accompany primetv. Detailed examples of all programs' usage are provided in the text. For the casual user a web-service provides a simple user interface to all programs. Conclusion: With primetv, the first visualization tool for general reconciliations, illustrations of trees-within-trees are easy to produce. Because it clarifies and accentuates an underlying structure in a reconciled tree, e. g., the impact of a species tree on a gene-family phylogeny, it will enhance scientific presentations as well as pedagogic illustrations in an educational setting. primetv is available at http://prime.sbc.su.se/primetv, both as a standalone command-line tool and as a web service. The software is distributed under the GNU General Public License.

  • 98. Sennblad, Bengt
    et al.
    Schreil, Eva
    Berglund Sonnhammer, Ann-Charlotte
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Lagergren, Jens
    Arvestad, Lars
    primetv: a viewer for reconciled trees2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, p. 148-Article in journal (Refereed)
    Abstract [en]

    Background: Evolutionary processes, such as gene family evolution or parasite-host cospeciation, can often be viewed as a tree evolving inside another tree. Relating two given trees under such a constraint is known as reconciling them. Adequate software tools for generating illustrations of tree reconciliations are instrumental for presenting and communicating results and ideas regarding these phenomena. Available visualization tools have been limited to illustrations of the most parsimonious reconciliation. However, there exists a plethora of biologically relevant non-parsimonious reconciliations. Illustrations of these general reconciliations may not be achieved without manual editing. Results: We have developed a new reconciliation viewer, primetv. It is a simple and compact visualization program that is the first automatic tool for illustrating general tree reconciliations. It reads reconciled trees in an extended Newick format and outputs them as tree-within-tree illustrations in a range of graphic formats. Output attributes, such as colors and layout, can easily be adjusted by the user. To enhance the construction of input to primetv, two helper programs, readReconciliation and reconcile, accompany primetv. Detailed examples of all programs' usage are provided in the text. For the casual user a web-service provides a simple user interface to all programs. Conclusion: With primetv, the first visualization tool for general reconciliations, illustrations of trees-within-trees are easy to produce. Because it clarifies and accentuates an underlying structure in a reconciled tree, e. g., the impact of a species tree on a gene-family phylogeny, it will enhance scientific presentations as well as pedagogic illustrations in an educational setting. primetv is available at http://prime.sbc.su.se/primetv, both as a standalone command-line tool and as a web service. The software is distributed under the GNU General Public License.

  • 99.
    Sjöstrand, Joel
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Lagergren, Jens
    Sennblad, Bengt
    GenPhyloData: realistic simulation of gene family evolution2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, article id 209Article in journal (Refereed)
    Abstract [en]

    Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and-perhaps more interestingly-also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Result: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. Conclusion: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

  • 100.
    Sjöstrand, Joel
    et al.
    Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden .
    Arvestad, Lars
    Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden .
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sennblad, Bengt
    Department of Medicine, Karolinska Institutet, Atherosclerosis Research Unit, Stockholm, Sweden .
    GenPhyloData: realistic simulation of gene family evolution2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, no 1, p. 209-Article in journal (Refereed)
    Abstract [en]

    Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and-perhaps more interestingly-also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Result: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. Conclusion: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

123 51 - 100 of 123
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf