Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient computational methods for applications in genomics
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0002-6212-539x
2019 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

During the last two decades, advances in molecular technology have facilitated the sequencing and analysis of ancient DNA recovered from archaeological finds, contributing to novel insights into human evolutionary history. As more ancient genetic information has become available, the need for specialized methods of analysis has also increased. In this thesis, we investigate statistical and computational models for analysis of genetic data, with a particular focus on the context of ancient DNA.

The main focus is on imputation, or the inference of missing genotypes based on observed sequence data. We present results from a systematic evaluation of a common imputation pipeline on empirical ancient samples, and show that imputed data can constitute a realistic option for population-genetic analyses. We also discuss preliminary results from a simulation study comparing two methods of phasing and imputation, which suggest that the parametric Li and Stephens framework may be more robust to extremely low levels of sparsity than the parsimonious Browning and Browning model.

An evaluation of methods to handle missing data in the application of PCA for dimensionality reduction of genotype data is also presented. We illustrate that non-overlapping sequence data can lead to artifacts in projected scores, and evaluate different methods for handling unobserved genotypes.

In genomics, as in other fields of research, increasing sizes of data sets are placing larger demands on efficient data management and compute infrastructures. The last part of this thesis addresses the use of cloud resources for facilitating such analysis. We present two different cloud-based solutions, and exemplify them on applications from genomics.

Place, publisher, year, edition, pages
Uppsala University, 2019.
Series
Information technology licentiate theses: Licentiate theses from the Department of Information Technology, ISSN 1404-5117 ; 2019-006
National Category
Computational Mathematics Genetics
Research subject
Scientific Computing
Identifiers
URN: urn:nbn:se:uu:diva-396409OAI: oai:DiVA.org:uu-396409DiVA, id: diva2:1367712
Supervisors
Projects
eSSENCEAvailable from: 2019-11-04 Created: 2019-11-04 Last updated: 2019-11-11Bibliographically approved
List of papers
1. An empirical evaluation of genotype imputation of ancient DNA
Open this publication in new window or tab >>An empirical evaluation of genotype imputation of ancient DNA
2019 (English)Report (Other academic)
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2019-008
National Category
Computational Mathematics Genetics
Identifiers
urn:nbn:se:uu:diva-396336 (URN)
Projects
eSSENCE
Available from: 2019-11-04 Created: 2019-11-04 Last updated: 2019-11-11Bibliographically approved
2. Evaluation of methods handling missing data in PCA on genotype data: Applications for ancient DNA
Open this publication in new window or tab >>Evaluation of methods handling missing data in PCA on genotype data: Applications for ancient DNA
2019 (English)Report (Other academic)
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2019-009
National Category
Computational Mathematics Genetics
Identifiers
urn:nbn:se:uu:diva-396346 (URN)
Projects
eSSENCE
Available from: 2019-11-04 Created: 2019-11-04 Last updated: 2019-11-11Bibliographically approved
3. BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data
Open this publication in new window or tab >>BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data
Show others...
2018 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, p. 240:1-11, article id 240Article in journal (Refereed) Published
National Category
Software Engineering Genetics
Identifiers
urn:nbn:se:uu:diva-360033 (URN)10.1186/s12859-018-2241-z (DOI)000436517200001 ()29940842 (PubMedID)
Projects
eSSENCE
Available from: 2018-06-26 Created: 2018-09-09 Last updated: 2019-11-11Bibliographically approved
4. SWEEP: Accelerating scientific research through scalable serverless workflows
Open this publication in new window or tab >>SWEEP: Accelerating scientific research through scalable serverless workflows
Show others...
2019 (English)In: Companion Proc. 12th International Conference on Utility and Cloud Computing, New York: ACM Press, 2019, p. 43-50Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2019
National Category
Software Engineering
Identifiers
urn:nbn:se:uu:diva-396405 (URN)10.1145/3368235.3368839 (DOI)978-1-4503-7044-8 (ISBN)
Conference
UCC 2019
Projects
eSSENCE
Available from: 2019-12-02 Created: 2019-11-04 Last updated: 2019-12-05Bibliographically approved

Open Access in DiVA

fulltext(507 kB)46 downloads
File information
File name FULLTEXT02.pdfFile size 507 kBChecksum SHA-512
e5af0468d2dcd81a8ba3be4fa7e0c0b89d07346dd22aca8162861e168446ab0e7b845a2dbbac18606a38c8c74b7c4c6f3709712af6413a17a648222d68f0bf15
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Ausmees, Kristiina
By organisation
Division of Scientific ComputingComputational Science
Computational MathematicsGenetics

Search outside of DiVA

GoogleGoogle Scholar
Total: 46 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 216 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf