Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Probabilistic inference of lataral gene transfer events
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-8034-7834
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-2791-8773
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:kth:diva-162935OAI: oai:DiVA.org:kth-162935DiVA: diva2:798169
Funder
Swedish e‐Science Research Center
Note

QS 2015

Available from: 2015-03-26 Created: 2015-03-26 Last updated: 2016-10-12Bibliographically approved
In thesis
1. Probabilistic Reconciliation Analysis for Genes and Pseudogenes
Open this publication in new window or tab >>Probabilistic Reconciliation Analysis for Genes and Pseudogenes
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Phylogeneticists have studied the evolution of life from single celled organisms to the astonishing biodiversity around us for a long time now. The relationship between species is often expressed as a binary tree - the tree of life. Availability of fully sequenced genomes across species provides us the opportunity to investigate and understand the evolutionary processes, and to reconstruct the gene and species phylogeny in greater detail and more accurately. However, the effect of interacting evolutionary processes, such as gene duplications, gene losses, pseudogenizations, and lateral gene transfers, makes the inference of gene phylogenies challenging.

In this thesis, probabilistic  Bayesian methods are introduced  to infer gene hylogenies in the guidance of species phylogeny. The distinguishing feature f this work from the earlier reconciliation-based methods is that evolutionary vents are mapped to detailed time intervals on the evolutionary time-scale. he proposed probabilistic approach reconciles the evolutionary events to the pecies phylogeny by integrating  gene duplications, gene losses, lateral gene ransfers and sequence evolution under a relaxed molecular clock. Genome- ide gene families for vertebrates and prokaryotes are  analyzed using this pproach that provides interesting insight into the evolutionary processes.

Finally, a probabilistic  model is introduced that  models evolution  of genes and pseudogenes  simultaneously. The model incorporates birth-death  pro- cess according to which genes are duplicated, pseudogenized and lost under a sequence evolution  model with  a relaxed molecular clock.  To model  the evolutionary scenarios realistically, the model employs two different sequence evolution  models for the  evolution  of genes  and pseudogenes. The recon- ciliation  of evolutionary events to the species phylogenies enable us to infer the evolutionary scenario with  a higher resolution.  Some subfamilies of two interesting gene superfamilies,  i.e.  olfactory receptors and zinc fingers, are analyzed using this approach, which provides interesting insights.

 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. vi, 58 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2015:03
Keyword
Phylogenetics, Evolution, Reconciliation Analysis, Bayesian Inference
National Category
Bioinformatics (Computational Biology)
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-162150 (URN)978-91-7595-488-2 (ISBN)
Public defence
2015-04-15, Air, SciLifeLab, Tomtebodavägen 23A, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

Q 20150326

Available from: 2015-03-26 Created: 2015-03-23 Last updated: 2015-03-26Bibliographically approved
2. Probabilistic Models for Species Tree Inference and Orthology Analysis
Open this publication in new window or tab >>Probabilistic Models for Species Tree Inference and Orthology Analysis
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A phylogenetic tree is used to model gene evolution and species evolution using molecular sequence data. For artifactual and biological reasons, a gene tree may differ from a species tree, a phenomenon known as gene tree-species tree incongruence. Assuming the presence of one or more evolutionary events, e.g., gene duplication, gene loss, and lateral gene transfer (LGT), the incongruence may be explained using a reconciliation of a gene tree inside a species tree. Such information has biological utilities, e.g., inference of orthologous relationship between genes.

In this thesis, we present probabilistic models and methods for orthology analysis and species tree inference, while accounting for evolutionary factors such as gene duplication, gene loss, and sequence evolution. Furthermore, we use a probabilistic LGT-aware model for inferring gene trees having temporal information for duplication and LGT events.

In the first project, we present a Bayesian method, called DLRSOrthology, for estimating orthology probabilities using the DLRS model: a probabilistic model integrating gene evolution, a relaxed molecular clock for substitution rates, and sequence evolution. We devise a dynamic programming algorithm for efficiently summing orthology probabilities over all reconciliations of a gene tree inside a species tree. Furthermore, we present heuristics based on receiver operating characteristics (ROC) curve to estimate suitable thresholds for deciding orthology events. Our method, as demonstrated by synthetic and biological results, outperforms existing probabilistic approaches in accuracy and is robust to incomplete taxon sampling artifacts.

In the second project, we present a probabilistic method, based on a mixture model, for species tree inference. The method employs a two-phase approach, where in the first phase, a structural expectation maximization algorithm, based on a mixture model, is used to reconstruct a maximum likelihood set of candidate species trees. In the second phase, in order to select the best species tree, each of the candidate species tree is evaluated using PrIME-DLRS: a method based on the DLRS model. The method is accurate, efficient, and scalable when compared to a recent probabilistic species tree inference method called PHYLDOG. We observe that, in most cases, the analysis constituted only by the first phase may also be used for selecting the target species tree, yielding a fast and accurate method for larger datasets.

Finally, we devise a probabilistic method based on the DLTRS model: an extension of the DLRS model to include LGT events, for sampling reconciliations of a gene tree inside a species tree. The method enables us to estimate gene trees having temporal information for duplication and LGT events. To the best of our knowledge, this is the first probabilistic method that takes gene sequence data directly into account for sampling reconciliations that contains information about LGT events. Based on the synthetic data analysis, we believe that the method has the potential to identify LGT highways.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. vi, 65 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 12
Keyword
phylogenetics, phylogenomics, gene tree, species tree, expectation maximization, mixture model, dynamic programming, markov chain monte carlo, PrIME, JPrIME
National Category
Bioinformatics (Computational Biology) Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-168146 (URN)978-91-7595-619-0 (ISBN)
Public defence
2015-06-12, Conference room Air, SciLifeLab, Tomtebodavägen 23A, Solna, 13:00 (English)
Opponent
Supervisors
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20150529

Available from: 2015-05-29 Created: 2015-05-27 Last updated: 2015-05-29Bibliographically approved
3. Computational Problems in Modeling Evolution and Inferring Gene Families.
Open this publication in new window or tab >>Computational Problems in Modeling Evolution and Inferring Gene Families.
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last few decades, phylogenetics has emerged as a very promising field, facilitating a comparative framework to explain the genetic relationships among all the living organisms on earth. These genetic relationships are typically represented by a bifurcating phylogenetic tree — the tree of life. Reconstructing a phylogenetic tree is one of the central tasks in evolutionary biology. The different evolutionary processes, such as gene duplications, gene losses, speciation, and lateral gene transfer events, make the phylogeny reconstruction task more difficult. However, with the rapid developments in sequencing technologies and availability of genome-scale sequencing data, give us the opportunity to understand these evolutionary processes in a more informed manner, and ultimately, enable us to reconstruct genes and species phylogenies more accurately. This thesis is an attempt to provide computational methods for phylogenetic inference and give tools to conduct genome-scale comparative evolutionary studies, such as detecting homologous sequences and inferring gene families.

In the first project, we present FastPhylo as a software package containing fast tools for reconstructing distance-based phylogenies. It implements the previously published efficient algorithms for estimating a distance matrix from the input sequences and reconstructing an un-rooted Neighbour Joining tree from a given distance matrix. Results on simulated datasets reveal that FastPhylo can handles hundred of thousands of sequences in a minimum time and memory efficient manner. The easy to use, well-defined interfaces, and the modular structure of FastPhylo allows it to be used in very large Bioinformatic pipelines.

In the second project, we present a synteny-aware gene homology method, called GenFamClust (GFC) that uses gene content and gene order conservation to detect homology. Results on simulated and biological datasets suggest that local synteny information combined with the sequence similarity improves the detection of homologs.

In the third project, we introduce a novel phylogeny-based clustering method, PhyloGenClust, which partitions a very large gene family into smaller subfamilies. ROC (receiver operating characteristics) analysis on synthetic datasets show that PhyloGenClust identify subfamilies more accurately. PhyloGenClust can be used as a middle tier clustering method between raw clustering methods, such as sequence similarity methods, and more sophisticated Bayesian-based phylogeny methods.

Finally, we introduce a novel probabilistic Bayesian method based on the DLTRS model, to sample reconciliations of a gene tree inside a species tree. The method uses MCMC framework to integrate LGTs, gene duplications, gene losses and sequence evolution under a relaxed molecular clock for substitution rates. The proposed sampling method estimates the posterior distribution of gene trees and provides the temporal information of LGT events over the lineages of a species tree. Analysis on simulated datasets reveal that our method performs well in identifying the true temporal estimates of LGT events. We applied our method to the genome-wide gene families for mollicutes and cyanobacteria, which gave an interesting insight into the potential LGTs highways. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. 57 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2016:24
Keyword
Evolution, Phylogenetics, Lateral Gene Transfer, Gene Families, Clustering
National Category
Bioinformatics (Computational Biology)
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-193637 (URN)978-91-7729-131-2 (ISBN)
Public defence
2016-10-18, Air, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Note

QC 20161010

Available from: 2016-10-10 Created: 2016-10-06 Last updated: 2016-10-10Bibliographically approved

Open Access in DiVA

fulltext(259 kB)32 downloads
File information
File name FULLTEXT01.pdfFile size 259 kBChecksum SHA-512
218369830c31396e1cc30fb73039800efa57721c92144a3d05629c66b76a1185406a08275bc1b59588e625815440485e28b1e53315126734cf974b796a177110
Type fulltextMimetype application/pdf
fulltext(833 kB)17 downloads
File information
File name FULLTEXT02.pdfFile size 833 kBChecksum SHA-512
3655426c006673e9e51047cf8d9147718a69b940e6c42838aa75d4815429223a1e88ffd419bca6b4ca445fe38ec5233338b8cd6fbe0c5dad1478211a4a629f7b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Khan, MehmoodMahmudi, OwaisUlah, IkramLagergren, Jens
By organisation
Computational Biology, CBSeRC - Swedish e-Science Research CentreScience for Life Laboratory, SciLifeLab
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 49 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 363 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf