Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Coverage Analysis in Clinical Next-Generation Sequencing
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
2019 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With the new way of sequencing by NGS new tools had to be developed

to be able to work with new data formats and to handle the larger

data sizes compared to the previous techniques but also to check the

accuracy of the data. Coverage analysis is one important quality

control for NGS data, the coverage indicates how many times each base

pair has been sequenced and thus how trustworthy each base call is.

For clinical purposes every base of interest must be quality

controlled as one wrong base call could affect the patient

negatively. The softwares used for coverage analysis with enough

accuracy and detail for clinical applications are sparse. Several

softwares like Samtools, are able to calculate coverage values but

does not further process this information in a useful way to produce

a QC report of each base pair of interest. My master thesis has

therefore been to create a new coverage analysis report tool, named

CAR tool, that extract the coverage values from Samtools and further

uses this data to produce a report consisting of tables, lists and

figures. CAR tool is created to replace the currently used tool,

ExCID, at the Clinical Genomics facility at SciLifeLab in Uppsala and

was developed to meet the needs of the bioinformaticians and

clinicians. CAR tool is written in python and launched from a

terminal window. The main function of the tool is to display coverage

breath values for each region of interest and to extract all sub

regions below a chosen coverage depth threshold. The low coverage

regions are then reported together with region name, start and stop

positions, length and mean coverage value. To make the tool useful to

as many as possible several settings are possible by entering

different flags when calling the tool. Such settings can be to

generate pie charts of each region’s coverage values, filtering of

the read and bases by quality or write your own entry that will be

used for the coverage calculation by Samtools. The tool has been

proved to find these low coverage regions very well. Most low regions

found are also found by ExCID, the currently used tool, some

differences did however occur and every such region was verified by

IGV. The coverage values shown in IGV coincided with those found by

CAR tool. CAR tool is written to find all low coverage regions even

if they are only one base pair long, while ExCID instead seem to

generate larger low regions not taking very short low regions into

account. To read more about the functions and how to use CAR tool I

refer to User instructions in the appendix and on GitHub at the

repository anod6351

Place, publisher, year, edition, pages
2019.
Series
UPTEC X ; 18 034
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-379228OAI: oai:DiVA.org:uu-379228DiVA, id: diva2:1296166
Educational program
Molecular Biotechnology Engineering Programme
Supervisors
Examiners
Available from: 2019-03-19 Created: 2019-03-14 Last updated: 2019-03-19Bibliographically approved

Open Access in DiVA

fulltext(578 kB)30 downloads
File information
File name FULLTEXT01.pdfFile size 578 kBChecksum SHA-512
461ca9503a8fce36889dc794aca6a698b93c773d24564d54aac686a997692b4a696c6c866a8689fb79ec383a4b1020b0695c4a1e1b31e4085bf46d728444db43
Type fulltextMimetype application/pdf

By organisation
Biology Education Centre
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 30 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 27 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf