Digitala Vetenskapliga Arkivet

Change search
Refine search result
123 1 - 50 of 123
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Abouelhoda, Mohamed
    et al.
    Issa, Shady
    Center for Informatics Sciences, Nile University, Giza, Egypt.
    Ghanem, Moustafa
    Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.

    RESULTS: In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.

    CONCLUSIONS: Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.

  • 2.
    Alexeyenko, Andrey
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lee, Woojoo
    Pernemalm, Maria
    Guegan, Justin
    Dessen, Philippe
    Lazar, Vladimir
    Lehtio, Janne
    Pawitan, Yudi
    Network enrichment analysis: extension of gene-set enrichment analysis to gene networks2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, p. 226-Article in journal (Refereed)
    Abstract [en]

    Background: Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. Results: We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. Conclusions: The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.

  • 3. Ali, Raja H.
    et al.
    Bark, Mikael
    Miró, Jorge
    Muhammad, Sayyed A.
    Sjöstrand, Joel
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    Zubair, Syed M.
    Abbas, Raja M.
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces2017In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, article id 97Article in journal (Refereed)
    Abstract [en]

    Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters.

    Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines.

    Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket. org/rhali/visualmcmc/.

  • 4.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Bark, Mikael
    KTH, School of Information and Communication Technology (ICT).
    Miró, Jorge
    KTH, School of Information and Communication Technology (ICT).
    Muhammad, Sayyed Auwn
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Sjöstrand, J.
    Zubair, Syed M.
    KTH, School of Electrical Engineering (EES), Communication Networks. University of Balochistan, Pakistan.
    Abbas, R. M.
    Arvestad, L.
    VMCMC: A graphical and statistical analysis tool for Markov chain Monte Carlo traces2017In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, no 1, article id 97Article in journal (Refereed)
    Abstract [en]

    Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters. Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines. Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.

  • 5.
    Ali, Raja Hashim
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Muhammad, Sayyed Auwn
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Khan, Mehmodd Alam
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    Stockholms universitet.
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, p. S12-Article in journal (Refereed)
    Abstract [en]

    Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

    Download full text (pdf)
    fulltext
  • 6. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, no Suppl,15, p. S12-Article in journal (Refereed)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 7.
    Al-Jaff, Mohammed
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Sandström, Eric
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Grabherr, Manfred
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala Univ, Bioinformat Infrastruct Life Sci, S-75123 Uppsala, Sweden..
    microTaboo: a general and practical solution to the k-disjoint problem2017In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, article id 228Article in journal (Refereed)
    Abstract [en]

    Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable.

    Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe-to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining.

    Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment-and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at:https://MohammedAlJaff.github.io/microTaboo

    Download full text (pdf)
    fulltext
  • 8.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Pharmacology.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Brunn: an open source laboratory information system for microplates with a graphical plate layout design process2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, no 1, article id 179Article in journal (Refereed)
    Abstract [en]

    Background:

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

    Results:

    A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

    Conclusions:

    Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

    Download full text (pdf)
    fulltext
  • 9.
    Andersson, Claes R.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Isaksson, Anders
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
    Gustafsson, Mats G.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology. Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
    Bayesian detection of periodic mRNA time profiles withouth use of training examples2006In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, p. 63-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Detection of periodically expressed genes from microarray data without use of known periodic and non-periodic training examples is an important problem, e.g. for identifying genes regulated by the cell-cycle in poorly characterised organisms. Commonly the investigator is only interested in genes expressed at a particular frequency that characterizes the process under study but this frequency is seldom exactly known. Previously proposed detector designs require access to labelled training examples and do not allow systematic incorporation of diffuse prior knowledge available about the period time. RESULTS: A learning-free Bayesian detector that does not rely on labelled training examples and allows incorporation of prior knowledge about the period time is introduced. It is shown to outperform two recently proposed alternative learning-free detectors on simulated data generated with models that are different from the one used for detector design. Results from applying the detector to mRNA expression time profiles from S. cerevisiae showsthat the genes detected as periodically expressed only contain a small fraction of the cell-cycle genes inferred from mutant phenotype. For example, when the probability of false alarm was equal to 7%, only 12% of the cell-cycle genes were detected. The genes detected as periodically expressed were found to have a statistically significant overrepresentation of known cell-cycle regulated sequence motifs. One known sequence motif and 18 putative motifs, previously not associated with periodic expression, were also over represented. CONCLUSION: In comparison with recently proposed alternative learning-free detectors for periodic gene expression, Bayesian inference allows systematic incorporation of diffuse a priori knowledge about, e.g. the period time. This results in relative performance improvements due to increased robustness against errors in the underlying assumptions. Results from applying the detector to mRNA expression time profiles from S. cerevisiae include several new findings that deserve further experimental studies.

  • 10.
    Ausmees, Kristiina
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    John, Aji
    Toor, Salman Z.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Hellander, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Nettelblad, Carl
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data2018In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, p. 240:1-11, article id 240Article in journal (Refereed)
  • 11.
    Barrio, Alvaro Martínez
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Lagercrantz, Erik
    Sperber, Göran O.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Physiology.
    Blomberg, Jonas
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Virology.
    Bongcam-Rudloff, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Annotation and visualization of endogenous retroviral sequences using the Distributed Annotation System (DAS) and eBioX2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10 Suppl. 6, p. S18-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: The Distributed Annotation System (DAS) is a widely used network protocol for sharing biological information. The distributed aspects of the protocol enable the use of various reference and annotation servers for connecting biological sequence data to pertinent annotations in order to depict an integrated view of the data for the final user. RESULTS: An annotation server has been devised to provide information about the endogenous retroviruses detected and annotated by a specialized in silico tool called RetroTector. We describe the procedure to implement the DAS 1.5 protocol commands necessary for constructing the DAS annotation server. We use our server to exemplify those steps. Data distribution is kept separated from visualization which is carried out by eBioX, an easy to use open source program incorporating multiple bioinformatics utilities. Some well characterized endogenous retroviruses are shown in two different DAS clients. A rapid analysis of areas free from retroviral insertions could be facilitated by our annotations. CONCLUSION: The DAS protocol has shown to be advantageous in the distribution of endogenous retrovirus data. The distributed nature of the protocol is also found to aid in combining annotation and visualization along a genome in order to enhance the understanding of ERV contribution to its evolution. Reference and annotation servers are conjointly used by eBioX to provide visualization of ERV annotations as well as other data sources. Our DAS data source can be found in the central public DAS service repository, http://www.dasregistry.org, or at http://loka.bmc.uu.se/das/sources.

    Download full text (pdf)
    FULLTEXT01
  • 12.
    Besnier, Francois
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Carlborg, Örjan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. Swedish University of Agricultural Sciences, Uppsala, Sweden.
    A general and efficient method for estimating continuous IBD functions for use in genome scans for QTL2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, article id 440Article in journal (Refereed)
    Abstract [en]

    Background: Identity by descent (IBD) matrix estimation is a central component in mapping of Quantitative Trait Loci (QTL) using variance component models. A large number of algorithms have been developed for estimation of IBD between individuals in populations at discrete locations in the genome for use in genome scans to detect QTL affecting various traits of interest in experimental animal, human and agricultural pedigrees. Here, we propose a new approach to estimate IBD as continuous functions rather than as discrete values. Results: Estimation of IBD functions improved the computational efficiency and memory usage in genome scanning for QTL. We have explored two approaches to obtain continuous marker-bracket IBD-functions. By re-implementing an existing and fast deterministic IBD-estimation method, we show that this approach results in IBD functions that produces the exact same IBD as the original algorithm, but with a greater than 2-fold improvement of the computational efficiency and a considerably lower memory requirement for storing the resulting genome-wide IBD. By developing a general IBD function approximation algorithm, we show that it is possible to estimate marker-bracket IBD functions from IBD matrices estimated at marker locations by any existing IBD estimation algorithm. The general algorithm provides approximations that lead to QTL variance component estimates that even in worst-case scenarios are very similar to the true values. The approach of storing IBD as polynomial IBD-function was also shown to reduce the amount of memory required in genome scans for QTL. Conclusion: In addition to direct improvements in computational and memory efficiency, estimation of IBD-functions is a fundamental step needed to develop and implement new efficient optimization algorithms for high precision localization of QTL. Here, we discuss and test two approaches for estimating IBD functions based on existing IBD estimation algorithms. Our approaches provide immediately useful techniques for use in single QTL analyses in the variance component QTL mapping framework. They will, however, be particularly useful in genome scans for multiple interacting QTL, where the improvements in both computational and memory efficiency are the key for successful development of efficient optimization algorithms to allow widespread use of this methodology.

    Download full text (pdf)
    FULLTEXT01
  • 13.
    Bilke, S
    et al.
    Lund University.
    Breslin, T
    Lund University.
    Sigvardsson, Mikael
    The Laboratory for Cell Differentiation Studies, Department for Stem Cell Biology, BMC B12, Lund.
    Probabilistic estimation of microarray data reliability and underlying gene expression2003In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 4, no 40Article in journal (Refereed)
    Abstract [en]

    Background: The availability of high throughput methods for measurement of mRNA concentrations makes the reliability of conclusions drawn from the data and global quality control of samples and hybridization important issues. We address these issues by an information theoretic approach, applied to discretized expression values in replicated gene expression data. Results: Our approach yields a quantitative measure of two important parameter classes: First, the probability P(sigma|S) that a gene is in the biological state sigma in a certain variety, given its observed expression S in the samples of that variety. Second, sample specific error probabilities which serve as consistency indicators of the measured samples of each variety. The method and its limitations are tested on gene expression data for developing murine B-cells and a t-test is used as reference. On a set of known genes it performs better than the t-test despite the crude discretization into only two expression levels. The consistency indicators, i.e. the error probabilities, correlate well with variations in the biological material and thus prove efficient. Conclusions: The proposed method is effective in determining differential gene expression and sample reliability in replicated microarray data. Already at two discrete expression levels in each sample, it gives a good explanation of the data and is comparable to standard techniques.

    Download full text (pdf)
    fulltext
  • 14.
    Borgmastars, Emmy
    et al.
    Umea Univ, Sweden.
    de Weerd, Hendrik Arnold
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering. Univ Skovde, Sweden.
    Lubovac-Pilav, Zelmina
    Univ Skovde, Sweden.
    Sund, Malin
    Umea Univ, Sweden.
    miRFA: an automated pipeline for microRNA functional analysis with correlation support from TCGA and TCPA expression data in pancreatic cancer2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, article id 393Article in journal (Refereed)
    Abstract [en]

    BackgroundMicroRNAs (miRNAs) are small RNAs that regulate gene expression at a post-transcriptional level and are emerging as potentially important biomarkers for various disease states, including pancreatic cancer. In silico-based functional analysis of miRNAs usually consists of miRNA target prediction and functional enrichment analysis of miRNA targets. Since miRNA target prediction methods generate a large number of false positive target genes, further validation to narrow down interesting candidate miRNA targets is needed. One commonly used method correlates miRNA and mRNA expression to assess the regulatory effect of a particular miRNA.The aim of this study was to build a bioinformatics pipeline in R for miRNA functional analysis including correlation analyses between miRNA expression levels and its targets on mRNA and protein expression levels available from the cancer genome atlas (TCGA) and the cancer proteome atlas (TCPA). TCGA-derived expression data of specific mature miRNA isoforms from pancreatic cancer tissue was used.ResultsFifteen circulating miRNAs with significantly altered expression levels detected in pancreatic cancer patients were queried separately in the pipeline. The pipeline generated predicted miRNA target genes, enriched gene ontology (GO) terms and Kyoto encyclopedia of genes and genomes (KEGG) pathways. Predicted miRNA targets were evaluated by correlation analyses between each miRNA and its predicted targets. MiRNA functional analysis in combination with Kaplan-Meier survival analysis suggest that hsa-miR-885-5p could act as a tumor suppressor and should be validated as a potential prognostic biomarker in pancreatic cancer.ConclusionsOur miRNA functional analysis (miRFA) pipeline can serve as a valuable tool in biomarker discovery involving mature miRNAs associated with pancreatic cancer and could be developed to cover additional cancer types. Results for all mature miRNAs in TCGA pancreatic adenocarcinoma dataset can be studied and downloaded through a shiny web application at https://emmbor.shinyapps.io/mirfa/.

    Download full text (pdf)
    fulltext
  • 15.
    Borgmästars, Emmy
    et al.
    Umeå University, Faculty of Medicine, Department of Surgical and Perioperative Sciences.
    de Weerd, Hendrik Arnold
    Lubovac-Pilav, Zelmina
    Sund, Malin
    Umeå University, Faculty of Medicine, Department of Surgical and Perioperative Sciences.
    miRFA: an automated pipeline for microRNA functional analysis with correlation support from TCGA and TCPA expression data in pancreatic cancer2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, article id 393Article in journal (Refereed)
    Abstract [en]

    Background: MicroRNAs (miRNAs) are small RNAs that regulate gene expression at a post-transcriptional level and are emerging as potentially important biomarkers for various disease states, including pancreatic cancer. In silico-based functional analysis of miRNAs usually consists of miRNA target prediction and functional enrichment analysis of miRNA targets. Since miRNA target prediction methods generate a large number of false positive target genes, further validation to narrow down interesting candidate miRNA targets is needed. One commonly used method correlates miRNA and mRNA expression to assess the regulatory effect of a particular miRNA.

    The aim of this study was to build a bioinformatics pipeline in R for miRNA functional analysis including correlation analyses between miRNA expression levels and its targets on mRNA and protein expression levels available from the cancer genome atlas (TCGA) and the cancer proteome atlas (TCPA). TCGA-derived expression data of specific mature miRNA isoforms from pancreatic cancer tissue was used.

    Results: Fifteen circulating miRNAs with significantly altered expression levels detected in pancreatic cancer patients were queried separately in the pipeline. The pipeline generated predicted miRNA target genes, enriched gene ontology (GO) terms and Kyoto encyclopedia of genes and genomes (KEGG) pathways. Predicted miRNA targets were evaluated by correlation analyses between each miRNA and its predicted targets. MiRNA functional analysis in combination with Kaplan-Meier survival analysis suggest that hsa-miR-885-5p could act as a tumor suppressor and should be validated as a potential prognostic biomarker in pancreatic cancer.

    Conclusions: Our miRNA functional analysis (miRFA) pipeline can serve as a valuable tool in biomarker discovery involving mature miRNAs associated with pancreatic cancer and could be developed to cover additional cancer types. Results for all mature miRNAs in TCGA pancreatic adenocarcinoma dataset can be studied and downloaded through a shiny web application at https://emmbor.shinyapps.io/mirfa/.

    Download full text (pdf)
    fulltext
  • 16.
    Borgmästars, Emmy
    et al.
    Department of Surgical and Perioperative Sciences, Umeå University, Umeå, Sweden.
    de Weerd, Hendrik Arnold
    University of Skövde, School of Bioscience. University of Skövde, The Systems Biology Research Centre. Department of Physics, Chemistry and Biology, Bioinformatics, Linköping University, Linköping, Sweden.
    Lubovac-Pilav, Zelmina
    University of Skövde, School of Bioscience. University of Skövde, The Systems Biology Research Centre.
    Sund, Malin
    Department of Surgical and Perioperative Sciences, Umeå University, Umeå, Sweden.
    miRFA: an automated pipeline for microRNA functional analysis with correlation support from TCGA and TCPA expression data in pancreatic cancer2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, no 1, p. 1-17, article id 393Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: MicroRNAs (miRNAs) are small RNAs that regulate gene expression at a post-transcriptional level and are emerging as potentially important biomarkers for various disease states, including pancreatic cancer. In silico-based functional analysis of miRNAs usually consists of miRNA target prediction and functional enrichment analysis of miRNA targets. Since miRNA target prediction methods generate a large number of false positive target genes, further validation to narrow down interesting candidate miRNA targets is needed. One commonly used method correlates miRNA and mRNA expression to assess the regulatory effect of a particular miRNA. The aim of this study was to build a bioinformatics pipeline in R for miRNA functional analysis including correlation analyses between miRNA expression levels and its targets on mRNA and protein expression levels available from the cancer genome atlas (TCGA) and the cancer proteome atlas (TCPA). TCGA-derived expression data of specific mature miRNA isoforms from pancreatic cancer tissue was used.

    RESULTS: Fifteen circulating miRNAs with significantly altered expression levels detected in pancreatic cancer patients were queried separately in the pipeline. The pipeline generated predicted miRNA target genes, enriched gene ontology (GO) terms and Kyoto encyclopedia of genes and genomes (KEGG) pathways. Predicted miRNA targets were evaluated by correlation analyses between each miRNA and its predicted targets. MiRNA functional analysis in combination with Kaplan-Meier survival analysis suggest that hsa-miR-885-5p could act as a tumor suppressor and should be validated as a potential prognostic biomarker in pancreatic cancer.

    CONCLUSIONS: Our miRNA functional analysis (miRFA) pipeline can serve as a valuable tool in biomarker discovery involving mature miRNAs associated with pancreatic cancer and could be developed to cover additional cancer types. Results for all mature miRNAs in TCGA pancreatic adenocarcinoma dataset can be studied and downloaded through a shiny web application at https://emmbor.shinyapps.io/mirfa/ .

    Download full text (pdf)
    fulltext
  • 17.
    Bornelöv, Susanne
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Marillet, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, p. 139-Article in journal (Refereed)
    Abstract [en]

    Background: The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. Results: We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. Conclusions: Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

    Download full text (pdf)
    fulltext
  • 18.
    Bresell, Anders
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics . Linköping University, The Institute of Technology.
    Persson, Bengt
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics . Linköping University, The Institute of Technology.
    Using SVM and tripeptide patterns to detect translated introns2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105Article in journal (Refereed)
  • 19.
    Buetti-Dinh, Antoine
    et al.
    Linnaeus University, Faculty of Health and Life Sciences, Department of Chemistry and Biomedical Sciences. Università della Svizzera Italiana, Italy;Swiss Institute of Bioinformatics, Switzerland.
    Friedman, Ran
    Linnaeus University, Faculty of Health and Life Sciences, Department of Chemistry and Biomedical Sciences.
    Computer simulations of the signalling network in FLT3+-acute myeloid leukaemia: indications for an optimal dosage of inhibitors against FLT3 and CDK62018In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, p. 1-13, article id 155Article in journal (Refereed)
    Abstract [en]

    Background

    Mutations in the FMS-like tyrosine kinase 3 (FLT3) are associated with uncontrolled cellular functions that contribute to the development of acute myeloid leukaemia (AML). We performed computer simulations of the FLT3-dependent signalling network in order to study the pathways that are involved in AML development and resistance to targeted therapies.

    Results

    Analysis of the simulations revealed the presence of alternative pathways through phosphoinositide 3 kinase (PI3K) and SH2-containing sequence proteins (SHC), that could overcome inhibition of FLT3. Inhibition of cyclin dependent kinase 6 (CDK6), a related molecular target, was also tested in the simulation but was not found to yield sufficient benefits alone.

    Conclusions

    The PI3K pathway provided a basis for resistance to treatments. Alternative signalling pathways could not, however, restore cancer growth signals (proliferation and loss of apoptosis) to the same levels as prior to treatment, which may explain why FLT3 resistance mutations are the most common resistance mechanism. Finally, sensitivity analysis suggested the existence of optimal doses of FLT3 and CDK6 inhibitors in terms of efficacy and toxicity.

  • 20.
    Buetti-Dinh, Antoine
    et al.
    Linnaeus University, Faculty of Health and Life Sciences, Department of Chemistry and Biomedical Sciences. Università della Svizzera italiana, Switzerland;Swiss Institute of Bioinformatics, Switzerland.
    Herold, Malte
    University of Luxembourg, Luxembourg.
    Christel, Stephan
    Linnaeus University, Faculty of Health and Life Sciences, Department of Biology and Environmental Science.
    El Hajjami, Mohamed
    QNLM, China.
    Delogu, Francesco
    Norwegian University of Life Sciences, Norway.
    Ilie, Olga
    Università della Svizzera italiana, Switzerland;Swiss Institute of Bioinformatics, Switzerland.
    Bellenberg, Sören
    Linnaeus University, Faculty of Health and Life Sciences, Department of Biology and Environmental Science.
    Wilmes, Paul
    University of Luxembourg, Luxembourg.
    Poetsch, Ansgar
    Ruhr University Bochum, Germany;QNLM, China;Ocean University of China, China.
    Sand, Wolfgang
    University Duisburg-Essen, Germany;Donghua University, China;3Mining Academy and Technical University Freiberg, Germany.
    Vera, Mario
    Pontificia Universidad Católica de Chile, Chile.
    Pivkin, Igor V.
    Università della Svizzera italiana, Switzerland;Swiss Institute of Bioinformatics, Switzerland.
    Friedman, Ran
    Linnaeus University, Faculty of Health and Life Sciences, Department of Chemistry and Biomedical Sciences.
    Dopson, Mark
    Linnaeus University, Faculty of Health and Life Sciences, Department of Biology and Environmental Science.
    Reverse engineering directed gene regulatory networks from transcriptomics and proteomics data of biomining bacterial communities with approximate Bayesian computation and steady-state signalling simulations2020In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 21, no 1, p. 1-15, article id 23Article in journal (Refereed)
    Abstract [en]

    Background: Network inference is an important aim of systems biology. It enables the transformation of OMICs datasets into biological knowledge. It consists of reverse engineering gene regulatory networks from OMICs data, such as RNAseq or mass spectrometry-based proteomics data, through computational methods. This approach allows to identify signalling pathways involved in specific biological functions. The ability to infer causality in gene regulatory networks, in addition to correlation, is crucial for several modelling approaches and allows targeted control in biotechnology applications. Methods: We performed simulations according to the approximate Bayesian computation method, where the core model consisted of a steady-state simulation algorithm used to study gene regulatory networks in systems for which a limited level of details is available. The simulations outcome was compared to experimentally measured transcriptomics and proteomics data through approximate Bayesian computation. Results: The structure of small gene regulatory networks responsible for the regulation of biological functions involved in biomining were inferred from multi OMICs data of mixed bacterial cultures. Several causal inter- and intraspecies interactions were inferred between genes coding for proteins involved in the biomining process, such as heavy metal transport, DNA damage, replication and repair, and membrane biogenesis. The method also provided indications for the role of several uncharacterized proteins by the inferred connection in their network context. Conclusions: The combination of fast algorithms with high-performance computing allowed the simulation of a multitude of gene regulatory networks and their comparison to experimentally measured OMICs data through approximate Bayesian computation, enabling the probabilistic inference of causality in gene regulatory networks of a multispecies bacterial system involved in biomining without need of single-cell or multiple perturbation experiments. This information can be used to influence biological functions and control specific processes in biotechnology applications.

  • 21.
    Bylesjö, Max
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Daniel
    Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Sjödin, Andreas
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Jansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Moritz, Thomas
    Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Orthogonal projections to latent structures as a strategy for microarray data normalization2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, no 207Article in journal (Refereed)
    Abstract [en]

    Background

    During generation of microarray data, various forms of systematic biases are frequently introduced which limits accuracy and precision of the results. In order to properly estimate biological effects, these biases must be identified and discarded.

    Results

    We introduce a normalization strategy for multi-channel microarray data based on orthogonal projections to latent structures (OPLS); a multivariate regression method. The effect of applying the normalization methodology on single-channel Affymetrix data as well as dual-channel cDNA data is illustrated. We provide a parallel comparison to a wide range of commonly employed normalization methods with diverse properties and strengths based on sensitivity and specificity from external (spike-in) controls. On the illustrated data sets, the OPLS normalization strategy exhibits leading average true negative and true positive rates in comparison to other evaluated methods.

    Conclusions

    The OPLS methodology identifies joint variation within biological samples to enable the removal of sources of variation that are non-correlated (orthogonal) to the within-sample variation. This ensures that structured variation related to the underlying biological samples is separated from the remaining, bias-related sources of systematic variation. As a consequence, the methodology does not require any explicit knowledge regarding the presence or characteristics of certain biases. Furthermore, there is no underlying assumption that the majority of elements should be non-differentially expressed, making it applicable to specialized boutique arrays.

  • 22.
    Bylesjö, Max
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Daniel
    Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Sjödin, Andreas
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Jansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Antti, Henrik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    MASQOT: a method for cDNA microarray spot quality control.2005In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6, p. 250-Article in journal (Refereed)
    Abstract [en]

    Background

    cDNA microarray technology has emerged as a major player in the parallel detection of biomolecules, but still suffers from fundamental technical problems. Identifying and removing unreliable data is crucial to prevent the risk of receiving illusive analysis results. Visual assessment of spot quality is still a common procedure, despite the time-consuming work of manually inspecting spots in the range of hundreds of thousands or more.

    Results

    A novel methodology for cDNA microarray spot quality control is outlined. Multivariate discriminant analysis was used to assess spot quality based on existing and novel descriptors. The presented methodology displays high reproducibility and was found superior in identifying unreliable data compared to other evaluated methodologies.

    Conclusion

    The proposed methodology for cDNA microarray spot quality control generates non-discrete values of spot quality which can be utilized as weights in subsequent analysis procedures as well as to discard spots of undesired quality using the suggested threshold values. The MASQOT approach provides a consistent assessment of spot quality and can be considered an alternative to the labor-intensive manual quality assessment process.

  • 23.
    Bylesjö, Max
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Rantalainen, Mattias
    Nicholson, Jeremy K
    Holmes, Elaine
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, p. 1-7, article id 106Article in journal (Refereed)
    Abstract [en]

    Background: Kernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation.

    Results: We demonstrate an implementation of the K-OPLS algorithm for MATLAB and R, licensed under the GNU GPL and available at http://www.sourceforge.net/projects/kopls/. The package includes essential functionality and documentation for model evaluation (using cross-validation), training and prediction of future samples. Incorporated is also a set of diagnostic tools and plot functions to simplify the visualisation of data, e.g. for detecting trends or for identification of outlying samples. The utility of the software package is demonstrated by means of a metabolic profiling data set from a biological study of hybrid aspen.

    Conclusion: The properties of the K-OPLS method are well suited for analysis of biological data, which in conjunction with the availability of the outlined open-source package provides a comprehensive solution for kernel-based analysis in bioinformatics applications.

    Download full text (pdf)
    fulltext
  • 24.
    Carlsson, Lars
    et al.
    Safety Assessment, AstraZeneca Research & Development.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Glen, Robert
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Boyer, Scott
    Safety Assessment, AstraZeneca Research & Development.
    Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 362-Article in journal (Refereed)
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

  • 25.
    Chalabi, Morteza H.
    et al.
    Univ Southern Denmark, Dept Biochem & Mol Biol, Campusvej 55, DK-5230 Odense M, Denmark.;Univ Southern Denmark, VILLUM Ctr Bioanalyt Sci, Campusvej 55, DK-5230 Odense M, Denmark..
    Tsiamis, Vasileios
    Univ Southern Denmark, Dept Biochem & Mol Biol, Campusvej 55, DK-5230 Odense M, Denmark.;Univ Southern Denmark, VILLUM Ctr Bioanalyt Sci, Campusvej 55, DK-5230 Odense M, Denmark..
    Käll, Lukas
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH). Royal Inst Technol, Sch Biotechnol, KTH Sci Life Lab, Solna, Sweden..
    Vandin, Fabio
    Univ Padua, Dept Informat Engn, Padua, Italy..
    Schwammle, Veit
    Univ Southern Denmark, Dept Biochem & Mol Biol, Campusvej 55, DK-5230 Odense M, Denmark.;Univ Southern Denmark, VILLUM Ctr Bioanalyt Sci, Campusvej 55, DK-5230 Odense M, Denmark..
    CoExpresso: assess the quantitative behavior of protein complexes in human cells2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, article id 17Article in journal (Refereed)
    Abstract [en]

    BackgroundTranslational and post-translational control mechanisms in the cell result in widely observable differences between measured gene transcription and protein abundances. Herein, protein complexes are among the most tightly controlled entities by selective degradation of their individual proteins. They furthermore act as control hubs that regulate highly important processes in the cell and exhibit a high functional diversity due to their ability to change their composition and their structure. Better understanding and prediction of these functional states demands methods for the characterization of complex composition, behavior, and abundance across multiple cell states. Mass spectrometry provides an unbiased approach to directly determine protein abundances across different cell populations and thus to profile a comprehensive abundance map of proteins.ResultsWe provide a tool to investigate the behavior of protein subunits in known complexes by comparing their abundance profiles across up to 140 cell types available in ProteomicsDB. Thorough assessment of different randomization methods and statistical scoring algorithms allows determining the significance of concurrent profiles within a complex, therefore providing insights into the conservation of their composition across human cell types as well as the identification of intrinsic structures in complex behavior to determine which proteins orchestrate complex function. This analysis can be extended to investigate common profiles within arbitrary protein groups. CoExpresso can be accessed through http://computproteomics.bmb.sdu.dk/Apps/CoExpresso.ConclusionsWith the CoExpresso web service, we offer a potent scoring scheme to assess proteins for their co-regulation and thereby offer insight into their potential for forming functional groups like protein complexes.

  • 26.
    Chantzi, Efthymia
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Jarvius, Malin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine. Uppsala Univ, In Vitro Syst Pharmacol Facil, SciLifeLab Drug Discovery & Dev, Uppsala, Sweden.
    Niklasson, Mia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Neuro-Oncology.
    Segerman, Anna
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Neuro-Oncology.
    Gustafsson, Mats G
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    COMBImage: a modular parallel processing framework for pairwise drug combination analysis that quantifies temporal changes in label-free video microscopy movies2018In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, article id 453Article in journal (Refereed)
    Abstract [en]

    Background: Large-scale pairwise drug combination analysis has lately gained momentum in drug discovery and development projects, mainly due to the employment of advanced experimental-computational pipelines. This is fortunate as drug combinations are often required for successful treatment of complex diseases. Furthermore, most new drugs cannot totally replace the current standard-of-care medication, but rather have to enter clinical use as add-on treatment. However, there is a clear deficiency of computational tools for label-free and temporal image-based drug combination analysis that go beyond the conventional but relatively uninformative end point measurements.

    Results: COMBImage is a fast, modular and instrument independent computational framework for in vitro pairwise drug combination analysis that quantifies temporal changes in label-free video microscopy movies. Jointly with automated analyses of temporal changes in cell morphology and confluence, it performs and displays conventional cell viability and synergy end point analyses. The image processing algorithms are parallelized using Google's MapReduce programming model and optimized with respect to method-specific tuning parameters. COMBImage is shown to process time-lapse microscopy movies from 384-well plates within minutes on a single quad core personal computer.This framework was employed in the context of an ongoing drug discovery and development project focused on glioblastoma multiforme; the most deadly form of brain cancer. Interesting add-on effects of two investigational cytotoxic compounds when combined with vorinostat were revealed on recently established clonal cultures of glioma-initiating cells from patient tumor samples. Therapeutic synergies, when normal astrocytes were used as a toxicity cell model, reinforced the pharmacological interest regarding their potential clinical use.

    Conclusions: COMBImage enables, for the first time, fast and optimized pairwise drug combination analyses of temporal changes in label-free video microscopy movies. Providing this jointly with conventional cell viability based end point analyses, it could help accelerating and guiding any drug discovery and development project, without use of cell labeling and the need to employ a particular live cell imaging instrument.

    Download full text (pdf)
    FULLTEXT01
  • 27.
    Chantzi, Efthymia
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Jarvius, Malin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Niklasson, Mia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Neuro-Oncology.
    Segerman, Anna
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Neuro-Oncology. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Gustafsson, Mats G
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    COMBImage2: a parallel computational framework for higher-order drug combination analysis that includes automated plate design, matched filter based object counting and temporal data mining2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, article id 304Article in journal (Refereed)
    Abstract [en]

    Background: Pharmacological treatment of complex diseases using more than two drugs is commonplace in the clinic due to better efficacy, decreased toxicity and reduced risk for developing resistance. However, many of these higher-order treatments have not undergone any detailed preceding in vitro evaluation that could support their therapeutic potential and reveal disease related insights. Despite the increased medical need for discovery and development of higher-order drug combinations, very few reports from systematic large-scale studies along this direction exist. A major reason is lack of computational tools that enable automated design and analysis of exhaustive drug combination experiments, where all possible subsets among a panel of pre-selected drugs have to be evaluated.

    Results: Motivated by this, we developed COMBImage2, a parallel computational framework for higher-order drug combination analysis. COMBImage2 goes far beyond its predecessor COMBImage in many different ways. In particular, it offers automated 384-well plate design, as well as quality control that involves resampling statistics and inter-plate analyses. Moreover, it is equipped with a generic matched filter based object counting method that is currently designed for apoptotic-like cells. Furthermore, apart from higher-order synergy analyses, COMBImage2 introduces a novel data mining approach for identifying interesting temporal response patterns and disentangling higher- from lower- and single-drug effects.COMBImage2 was employed in the context of a small pilot study focused on the CUSP9v4 protocol, which is currently used in the clinic for treatment of recurrent glioblastoma. For the first time, all 246 possible combinations of order 4 or lower of the 9 single drugs consisting the CUSP9v4 cocktail, were evaluated on an in vitro clonal culture of glioma initiating cells.

    Conclusions: COMBImage2 is able to automatically design and robustly analyze exhaustive and in general higher-order drug combination experiments. Such a versatile video microscopy oriented framework is likely to enable, guide and accelerate systematic large-scale drug combination studies not only for cancer but also other diseases.

    Download full text (pdf)
    fulltext
  • 28.
    D'Elia, Domenica
    et al.
    Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy.
    Gisel, Andreas
    Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy.
    Eriksson, Nils-Einar
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Kossida, Sophia
    Bioinformatics & Medical Informatics Team, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece.
    Mattila, Kimmo
    CSC – IT Center for Science Ltd., Keilaranta 14, 02100 Espoo, Finland.
    Klucar, Lubos
    Institute of Molecular Biology, Slovak Academy of Sciences, Dubravska cesta 21, 84551 Bratislava, Slovakia.
    Bongcam-Rudloff, Erik
    Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 75024 Uppsala, Sweden.
    The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, no Suppl. 6, p. S1-Article in journal (Refereed)
    Abstract [en]

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in.

    Download full text (pdf)
    fulltext
  • 29. Duforet-Frebourg, Nicolas
    et al.
    Gattepaille, Lucie M.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Blum, Michael G. B.
    Jakobsson, Mattias
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    HaploPOP: a software that improves population assignment by combining markers into haplotypes2015In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 16, article id 242Article in journal (Refereed)
    Abstract [en]

    Background: In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. However, assigning individuals to known populations can be difficult if the level of genetic differentiation among populations is small. Most assignment studies handle independent markers, often by pruning markers in Linkage Disequilibrium (LD), ignoring the information contained in the correlation among markers due to LD. Results: To improve the accuracy of population assignment, we present an algorithm, implemented in the HaploPOP software, that combines markers into haplotypes, without requiring independence. The algorithm is based on the Gain of Informativeness for Assignment that provides a measure to decide if a pair of markers should be combined into haplotypes, or not, in order to improve assignment. Because complete exploration of all possible solutions for constructing haplotypes is computationally prohibitive, our approach uses a greedy algorithm based on windows of fixed sizes. We evaluate the performance of HaploPOP to assign individuals to populations using a split-validation approach. We investigate both simulated SNPs data and dense genotype data from individuals from Spain and Portugal. Conclusions: Our results show that constructing haplotypes with HaploPOP can substantially reduce assignment error. The HaploPOP software is freely available as a command-line software at www.ieg.uu.se/Jakobsson/software/HaploPOP/.

    Download full text (pdf)
    fulltext
  • 30.
    Eklund, Martin
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    An eScience-Bayes strategy for analyzing omics data2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 282-Article in journal (Refereed)
    Abstract [en]

    Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions: Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.

  • 31.
    Eklund, Martin
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    The C1C2: a framework for simultaneous model selection and assessment2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, p. 360-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. RESULTS: The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. CONCLUSION: The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

  • 32.
    Elias, Isaac
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Fast Computation of Distance Estimators2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, p. 89-Article in journal (Refereed)
    Abstract [en]

    Background: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. Results: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. Conclusion: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.

  • 33.
    Ensterö, Mats
    et al.
    Stockholm University.
    Åkerborg, Örjan
    Stockholm University ; KTH Royal Institute of Technology.
    Lundin, Daniel
    Stockholm University.
    Wang, Bei
    Duke University, USA.
    Furey, Terrence S
    Institute for Genome Sciences and Policy (IGSP), USA ; Duke University, USA.
    Öhman, Marie
    Stockholm University.
    Lagergren, Jens
    Stockholm University ; KTH Royal Institute of Technology.
    A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, article id 6Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals.

    RESULTS: We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing.

    CONCLUSIONS: Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.

    Download full text (pdf)
    fulltext
  • 34.
    Ensterö, Mats
    et al.
    Stockholm University, Faculty of Science, Department of Molecular Biology and Functional Genomics.
    Åkerborg, Örjan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lundin, Daniel
    Stockholm University, Faculty of Science, Department of Molecular Biology and Functional Genomics.
    Wang, Bei
    Furey, Terrence S
    Öhman, Marie
    Stockholm University, Faculty of Science, Department of Molecular Biology and Functional Genomics.
    Lagergren, Jens
    A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, no 6Article in journal (Refereed)
    Abstract [en]

    Background

    Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals.

    Results

    We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing.

    Conclusions

    Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.

  • 35.
    Ensterö, Mats
    et al.
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Åkerborg, Örjan
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lundin, Daniel
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Wang, Bei
    Department of Computer Science, Duke University, Durham, United States.
    Furey, Terrence S.
    Department of Computer Science, Duke University, Durham, United States.
    Öhman, Marie
    Department of Molecular Biology and Functional Genomics, Stockholm University.
    Lagergren, Jens
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11Article in journal (Refereed)
    Abstract [en]

    Background: Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals. Results: We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing. Conclusions: Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.

  • 36. Flores, Samuel
    FlexOracle: predicting flexible hinges by identification of stable domains2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, p. 215-Article in journal (Refereed)
  • 37. Flores, Samuel
    Hinge Atlas: relating sequence features to sites of structural flexibility2007In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, p. 167-Article in journal (Refereed)
  • 38.
    Flores, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Gerstein, Mark
    Yale University.
    Predicting protein ligand binding motions with the Conformation Explorer2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, p. 417-Article in journal (Refereed)
    Abstract [en]

    Background

    Knowledge of the structure of proteins bound to known or potential ligands is crucial for biological understanding and drug design. Often the 3D structure of the protein is available in some conformation, but binding the ligand of interest may involve a large scale conformational change which is difficult to predict with existing methods.

    Results

    We describe how to generate ligand binding conformations of proteins that move by hinge bending, the largest class of motions. First, we predict the location of the hinge between domains. Second, we apply an Euler rotation to one of the domains about the hinge point. Third, we compute a short-time dynamical trajectory using Molecular Dynamics to equilibrate the protein and ligand and correct unnatural atomic positions. Fourth, we score the generated structures using a novel fitness function which favors closed or holo structures. By iterating the second through fourth steps we systematically minimize the fitness function, thus predicting the conformational change required for small ligand binding for five well studied proteins.

    Conclusions

    We demonstrate that the method in most cases successfully predicts the holo conformation given only an apo structure.

    Download full text (pdf)
    fulltext
  • 39.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Pekkari, Isabella
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain architecture conservation in orthologs2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, p. 326-Article in journal (Refereed)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 40.
    Fredriksson, Nils Johan
    et al.
    Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
    Hermansson, Malte
    Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.
    Wilén, Britt-Marie
    Department of Civil and Environmental Engineering, Water Environment Technology, Chalmers University of Technology, Gothenburg, Sweden.
    Impact of T-RFLP data analysis choices on assessments of microbial community structure and dynamics2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, article id 360Article in journal (Refereed)
    Abstract [en]

    Background: Terminal restriction fragment length polymorphism (T-RFLP) analysis is a common DNA-fingerprinting technique used for comparisons of complex microbial communities. Although the technique is well established there is no consensus on how to treat T-RFLP data to achieve the highest possible accuracy and reproducibility. This study focused on two critical steps in the T-RFLP data treatment: the alignment of the terminal restriction fragments (T-RFs), which enables comparisons of samples, and the normalization of T-RF profiles, which adjusts for differences in signal strength, total fluorescence, between samples.

    Results: Variations in the estimation of T-RF sizes were observed and these variations were found to affect the alignment of the T-RFs. A novel method was developed which improved the alignment by adjusting for systematic shifts in the T-RF size estimations between the T-RF profiles. Differences in total fluorescence were shown to be caused by differences in sample concentration and by the gel loading. Five normalization methods were evaluated and the total fluorescence normalization procedure based on peak height data was found to increase the similarity between replicate profiles the most. A high peak detection threshold, alignment correction, normalization and the use of consensus profiles instead of single profiles increased the similarity of replicate T-RF profiles, i.e. lead to an increased reproducibility. The impact of different treatment methods on the outcome of subsequent analyses of T-RFLP data was evaluated using a dataset from a longitudinal study of the bacterial community in an activated sludge wastewater treatment plant. Whether the alignment was corrected or not and if and how the T-RF profiles were normalized had a substantial impact on ordination analyses, assessments of bacterial dynamics and analyses of correlations with environmental parameters.

    Conclusions: A novel method for the evaluation and correction of the alignment of T-RF profiles was shown to reduce the uncertainty and ambiguity in alignments of T-RF profiles. Large differences in the outcome of assessments of bacterial community structure and dynamics were observed between different alignment and normalization methods. The results of this study can therefore be of value when considering what methods to use in the analysis of T-RFLP data.

  • 41.
    Fredriksson, Nils Johan
    et al.
    Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
    Hermansson, Malte
    Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.
    Wilén, Britt-Marie
    Department of Civil and Environmental Engineering, Water Environment Technology, Chalmers University of Technology, Gothenburg, Sweden.
    Tools for T-RFLP data analysis using Excel2014In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, article id 361Article in journal (Refereed)
    Abstract [en]

    Background: Terminal restriction fragment length polymorphism (T-RFLP) analysis is a DNA-fingerprinting method that can be used for comparisons of the microbial community composition in a large number of samples. There is no consensus on how T-RFLP data should be treated and analyzed before comparisons between samples are made, and several different approaches have been proposed in the literature. The analysis of T-RFLP data can be cumbersome and time-consuming, and for large datasets manual data analysis is not feasible. The currently available tools for automated T-RFLP analysis, although valuable, offer little flexibility, and few, if any, options regarding what methods to use. To enable comparisons and combinations of different data treatment methods an analysis template and an extensive collection of macros for T-RFLP data analysis using Microsoft Excel were developed.

    Results: The Tools for T-RFLP data analysis template provides procedures for the analysis of large T-RFLP datasets including application of a noise baseline threshold and setting of the analysis range, normalization and alignment of replicate profiles, generation of consensus profiles, normalization and alignment of consensus profiles and final analysis of the samples including calculation of association coefficients and diversity index. The procedures are designed so that in all analysis steps, from the initial preparation of the data to the final comparison of the samples, there are various different options available. The parameters regarding analysis range, noise baseline, T-RF alignment and generation of consensus profiles are all given by the user and several different methods are available for normalization of the T-RF profiles. In each step, the user can also choose to base the calculations on either peak height data or peak area data.

    Conclusions: The Tools for T-RFLP data analysis template enables an objective and flexible analysis of large T-RFLP datasets in a widely used spreadsheet application.

  • 42.
    Freyhult, Eva
    et al.
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
    Landfors, Mattias
    Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology. Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
    Önskog, Jenny
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Hvidsten, Torgeir R.
    Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
    Rydén, Patrik
    Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics. Umeå University, Faculty of Social Sciences, Department of Statistics.
    Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, article id 503Article in journal (Refereed)
    Abstract [en]

    Background: Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization.

    Result: We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions.

    Conclusions: The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data.

    Download full text (pdf)
    FULLTEXT02
  • 43.
    Freyhult, Eva
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
    Prusis, Peteris
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
    Lapinsh, Maris
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
    Moulton, Vincent
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
    Gustafsson, Mats G
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences.
    Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling2005In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6, p. 50-Article in journal (Refereed)
    Abstract [en]

    Background

    Proteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information. Several reported proteochemometric models of ligand-receptor interactions have already yielded significant insights into various forms of bio-molecular interactions. The proteochemometric models are multivariate regression models that predict binding affinity for a particular combination of features of the ligand and protein. Although proteochemometric models have already offered interesting results in various studies, no detailed statistical evaluation of their average predictive power has been performed. In particular, variable subset selection performed to date has always relied on using all available examples, a situation also encountered in microarray gene expression data analysis.

    Results

    A methodology for an unbiased evaluation of the predictive power of proteochemometric models was implemented and results from applying it to two of the largest proteochemometric data sets yet reported are presented. A double cross-validation loop procedure is used to estimate the expected performance of a given design method. The unbiased performance estimates (P2) obtained for the data sets that we consider confirm that properly designed single proteochemometric models have useful predictive power, but that a standard design based on cross validation may yield models with quite limited performance. The results also show that different commercial software packages employed for the design of proteochemometric models may yield very different and therefore misleading performance estimates. In addition, the differences in the models obtained in the double CV loop indicate that detailed chemical interpretation of a single proteochemometric model is uncertain when data sets are small.

    Conclusion

    The double CV loop employed offer unbiased performance estimates about a given proteochemometric modelling procedure, making it possible to identify cases where the proteochemometric design does not result in useful predictive models. Chemical interpretations of single proteochemometric models are uncertain and should instead be based on all the models selected in the double CV loop employed here.

  • 44. Granholm, Viktor
    et al.
    Noble, William Stafford
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, p. S3-Article in journal (Refereed)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 45.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Noble, William Stafford
    Käll, Lukas
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, p. S3-Article in journal (Refereed)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 46.
    Hallén, Kristofer
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Computational Biology. Linköping University, The Institute of Technology.
    Björkegren, Johan
    Karolinska universitetssjukhuset.
    Tegnér, Jesper
    Linköping University, Department of Physics, Chemistry and Biology, Computational Biology. Linköping University, The Institute of Technology.
    Detection of compound mode of action by computational integration of whole-genome measurements and genetic perturbations2006In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7Article in journal (Refereed)
    Abstract [en]

    Background

    A key problem of drug development is to decide which compounds to evaluate further in expensive clinical trials (Phase I- III). This decision is primarily based on the primary targets and mechanisms of action of the chemical compounds under consideration. Whole-genome expression measurements have shown to be useful for this process but current approaches suffer from requiring either a large number of mutant experiments or a detailed understanding of the regulatory networks.

    Results

    We have designed an algorithm, CutTree that when applied to whole-genome expression datasets identifies the primary affected genes (PAGs) of a chemical compound by separating them from downstream, indirectly affected genes. Unlike previous methods requiring whole-genome deletion libraries or a complete map of gene network architecture, CutTree identifies PAGs from a limited set of experimental perturbations without requiring any prior information about the underlying pathways. The principle for CutTree is to iteratively filter out PAGs from other recurrently active genes (RAGs) that are not PAGs. The in silico validation predicted that CutTree should be able to identify 3–4 out of 5 known PAGs (~70%). In accordance, when we applied CutTree to whole-genome expression profiles from 17 genetic perturbations in the presence of galactose in Yeast, CutTree identified four out of five known primary galactose targets (80%). Using an exhaustive search strategy to detect these PAGs would not have been feasible (>1012 combinations).

    Conclusion

    In combination with genetic perturbation techniques like short interfering RNA (siRNA) followed by whole-genome expression measurements, CutTree sets the stage for compound target identification in less well-characterized but more disease-relevant mammalian cell systems.

  • 47.
    Hedlund, Joel
    et al.
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics . Linköping University, The Institute of Technology.
    Jörnvall, Hans
    Dept of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 77 Stockholm, Sweden.
    Persson, Bengt
    Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics . Linköping University, The Institute of Technology.
    Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 534-Article in journal (Refereed)
    Abstract [en]

    Backgroun: The Medium-chain Dehydrogenases/Reductases (MDR) form a protein superfamily whose size and complexity defeats traditional means of subclassification; it currently has over 15000 members in the databases, the pairwise sequence identity is typically around 25%, there are members from all kingdoms of life, the chain-lengths vary as does the oligomericity, and the members are partaking in a multitude of biological processes. There are profile hidden Markov models (HMMs) available for detecting MDR superfamily members, but none for determining which MDR family each protein belongs to. The current torrential influx of new sequence data enables elucidation of more and more protein families, and at an increasingly fine granularity. However, gathering good quality training data usually requires manual attention by experts and has therefore been the rate limiting step for expanding the number of available models.

    Result: We have developed an automated algorithm for HMM refinement that produces stable and reliable models for protein families. This algorithm uses relationships found in data to generate confident seed sets. Using this algorithm we have produced HMMs for 86 distinct MDR families and 34 of their subfamilies which can be used in automated annotation of new sequences. We find that MDR forms with 2 Zn2+ ions in general are dehydrogenases, while MDR forms with no Zn2+ in general are reductases. Furthermore, in Bacteria MDRs without Zn2+ are more frequent than those with Zn2+, while the opposite is true for eukaryotic MDRs, indicating that Zn2+ has been recruited into the MDR superfamily after the initial life kingdom separations. We have also developed a web site http://mdr-enzymes.org webcite that provides textual and numeric search against various characterised MDR family properties, as well as sequence scan functions for reliable classification of novel MDR sequences.

    Conclusion: Our method of refinement can be readily applied to create stable and reliable HMMs for both MDR and other protein families, and to confidently subdivide large and complex protein superfamilies. HMMs created using this algorithm correspond to evolutionary entities, making resolution of overlapping models straightforward. The implementation and support scripts for running the algorithm on computer clusters are available as open source software, and the database files underlying the web site are freely downloadable. The web site also makes our findings directly useful also for non-bioinformaticians.

    Download full text (pdf)
    FULLTEXT01
  • 48.
    Hooper, Sean D
    et al.
    The Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research.
    Jiao, Xiang
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
    Rosenlund, Magnus
    Tellgren-Roth, Christian
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
    Cavelier, Lucia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Medical Genetics.
    Sjöblom, Tobias
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Genomics.
    Interpreting translocations detected by paired-end sequencing of cancer samples2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105Article in journal (Refereed)
  • 49.
    Illergård, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Callegari, Simone
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    MPRAP: An accessibility predictor for a-helical transmem-brane proteins that performs well inside and outside the membrane2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 333-Article in journal (Refereed)
    Abstract [en]

    Background: In water-soluble proteins it is energetically favorable to bury hydrophobic residues and to expose polar and charged residues. In contrast to water soluble proteins, transmembrane proteins face three distinct environments; a hydrophobic lipid environment inside the membrane, a hydrophilic water environment outside the membrane and an interface region rich in phospholipid head-groups. Therefore, it is energetically favorable for transmembrane proteins to expose different types of residues in the different regions. Results: Investigations of a set of structurally determined transmembrane proteins showed that the composition of solvent exposed residues differs significantly inside and outside the membrane. In contrast, residues buried within the interior of a protein show a much smaller difference. However, in all regions exposed residues are less conserved than buried residues. Further, we found that current state-of-the-art predictors for surface area are optimized for one of the regions and perform badly in the other regions. To circumvent this limitation we developed a new predictor, MPRAP, that performs well in all regions. In addition, MPRAP performs better on complete membrane proteins than a combination of specialized predictors and acceptably on water-soluble proteins. A web-server of MPRAP is available at http://mprap.cbr.su.se/ Conclusion: By including complete a-helical transmembrane proteins in the training MPRAP is able to predict surface accessibility accurately both inside and outside the membrane. This predictor can aid in the prediction of 3D-structure, and in the identification of erroneous protein structures.

    Download full text (pdf)
    Fulltext
  • 50.
    Jay, Jeremy J.
    et al.
    Jackson Lab, USA .
    Eblen, John D.
    Oak Ridge National Lab, USA .
    Zhang, Yun
    Pioneer HiBred Int Inc, USA .
    Benson, Mikael
    Linköping University, Department of Clinical and Experimental Medicine. Linköping University, Faculty of Health Sciences.
    Perkins, Andy D.
    Mississippi State University, USA .
    Saxton, Arnold M.
    University of Tennessee, USA .
    Voy, Brynn H.
    University of Tennessee, USA .
    Chesler, Elissa J.
    Jackson Lab, USA .
    Langston, Michael A.
    University of Tennessee, USA .
    A systematic comparison of genome-scale clustering algorithms2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13Article in journal (Refereed)
    Abstract [en]

    Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.

    Download full text (pdf)
    fulltext
123 1 - 50 of 123
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf