Change search
Refine search result
1 - 5 of 5
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ali, Raja Hashim
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Bogusz, Marcin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Whelan, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    A graph-based approach for improving the homologyinference in multiple sequence alignmentsManuscript (preprint) (Other academic)
    Abstract [en]

    Multiple sequence alignment (MSA) is ubiquitous in evolutionary studies and other areas ofbioinformatics. In nearly all cases MSAs are taken to be a known and xed quantity on which toperform downstream analysis despite extensive evidence that MSA accuracy and uncertainty aectsresults. Mistakes in the MSA are known to cause a wide range of problems for downstream evolutionaryinference, ranging from false inference of positive selection to long branch attraction artifacts. The mostpopular approach to dealing with this problem is to remove (lter) specic columns in the MSA thatare thought to be prone to error, either through proximity to gaps or through some scoring function.Although popular, this approach has had mixed success and several studies have even suggested thatltering might be detrimental to phylogenetic studies. Here we present a dierent approach to dealingwith MSA accuracy and uncertainty through a graph-based approach implemented in the freely availablesoftware Divvier. The aim of Divvier is to identify clusters of characters that have strong statisticalevidence of shared homology, based on the output of a pair hidden Markov model. These clusters canthen be used to either lter characters out the MSA, through a process we call partial ltering, or torepresent each of the clusters in a new column, through a process we call divvying up. We validateour approach through its performance on real and simulated benchmarks, nding Divvier substantiallyoutperforms all other ltering software for treating MSAs by retaining more true positive homology callsand removing more false positive homology calls. We also nd that Divvier, in contrast to other lteringtools, can alleviate long branch attraction artifacts induced by MSA and reduces the variation in treeestimates caused by MSA uncertainty.

  • 2.
    Bogusz, Marcin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Evolutionary Approaches to Sequence Alignment2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Molecular evolutionary biology allows us to look into the past by analyzing sequences of amino acids or nucleotides. These analyses can be very complex, often involving advanced statistical models of sequence evolution to construct phylogenetic trees, study the patterns of natural selection and perform a number of other evolutionary studies. In many cases, these evolutionary studies require a prerequisite of multiple sequence alignment (MSA) - a technique, which aims at grouping the characters that share a common ancestor, or homology, into columns. This information regarding shared homology is needed by statistical models to describe the process of substitutions in order to perform evolutionary inference. Sequence alignment, however, is difficult and MSAs often contain whole regions of wrongly aligned characters, which impact downstream analyses.

    In this thesis I use two broad groups of approaches to avoid errors in the alignment. The first group addresses the analysis methods without sequence alignment by explicitly modelling the processes of substitutions, and insertions and deletions (indels) between pairs of sequences using pair hidden Markov models. I describe an accurate tree inference method that uses a neighbor joining clustering approach to construct a tree from a matrix of model-based evolutionary distances.

    Next, I develop a pairwise method of modelling how natural selection acts on substitutions and indels. I further show the relationship between the constraints acting on these two evolutionary forces to show that natural selection affects them in a similar way.

    The second group of approaches deals with errors in existing alignments. I use a statistical model-based approach to evaluate the quality of multiple sequence alignments.

    First, I provide a graph-based tool for removing wrongly aligned pairs of residues by splitting them apart. This approach tends to produce better results when compared to standard column-based filtering.

    Second, I provide a way to compare MSAs using a probabilistic framework. I propose new ways of scoring of sequence alignments and show that popular methods produce similar results.

    The overall purpose of this work is to facilitate more accurate evolutionary analyses by addressing the problem of sequence alignment in a statistically rigorous manner.

  • 3.
    Bogusz, Marcin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Ali, Raja Hashim
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Whelan, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Examining sequence alignments using a model-based approachManuscript (preprint) (Other academic)
    Abstract [en]

    Multiple sequence alignment (MSA) is a commonly performed procedure required for a number of evolutionary and comparative analyses. The common two-step process of sequence alignment followed by statistical phylogenetic inference depends on MSA quality. MSA is computationally difficult and as a result in many cases sequence alignments contain regions of spurious homologies. These errors in the alignment affect downstream results, so choosing an accurate MSA is critical.  Researchers often face the problem of choosing an aligner out of many multiple sequence alignment methods (MSAMs). This choice is often based on the results of benchmarks with various popular methods claiming high accuracy scores. These methods compete to obtain the highest scores in the commonly used sum-of-pairs benchmark—which accounts for a fraction of the true homologies recovered—ignoring the fraction of introduced false positive homologies. Furthermore, these benchmarks do not account for the fact that some homologies are more difficult to recover than the others. We take a probabilistic model-based approach to examine the quality of pairwise homologies returned by four popular MSAMs. We use pair-hidden Markov models to break down alignment columns into pairs and obtain distributions of pairwise posterior scores for these aligners. Basing our results on a structural benchmark and a simulation study, we find that MSAMs appear to return a sample from a confidence set defined by high posterior probabilities. Furthermore, we find that the reference alignment contains low pairwise posterior portions of pairwise homologies which cannot be expected to be recovered by any MSAM. Finally, we look at several possible test statistics, with and without the need for reference alignments, and ultimately suggest using positive predictive value (PPV) and mean posterior probability for MSA evaluation.

  • 4.
    Bogusz, Marcin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Whelan, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking2017In: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 66, no 2, p. 218-231Article in journal (Refereed)
    Abstract [en]

    Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error.

  • 5.
    Bogusz, Marcin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Whelan, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Selection acting on indels and substitutions in protein coding sequencesManuscript (preprint) (Other academic)
    Abstract [en]

    Patterns of selection acting on an expressed protein act to maintain or adapt its structure and function over time. The most widely used method for studying these selective forces is the ratio of synonymous to non-synonymous substitutions (dN/dS), which helps distinguish between neutral, purifying (negative), and adaptive (positive) selection. This ratio, however, examines only amino acid substitutions and ignores other evolutionary forces like small-scale insertions and deletions (indels) that may affect protein evolution. There are currently no statistically robust methods for studying the forces acting on protein sequence indels, with the few ad hoc solutions highly dependent on the gap patterns produced by alignment and filtering steps. This study broadens our understanding of how selection acts on indels in proteins by explicitly examining the relationship between selective constraint acting on substitutions and indels. We present a probabilistic model that jointly estimates dN/dS and the indel rate through statistical alignment, which removes biases in both parameter estimates caused by alignment error. We apply our method to thousands of genes from human-mouse and human-chicken pairwise analyses, revealing that the indel rate and selection (dN/dS) tends to be related, demonstrating that purifying selection acting in proteins tends to affect non-synonymous mutations and indels in a quantifiably similar way. We also investigate how the selective forces acting on substitutions and indels vary along genes. Our findings and methods offer the opportunity to begin studying the interaction between substitutions and indels, and the first widely applicable tools for understanding how they impact protein evolution.

1 - 5 of 5
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf