Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A strand specific high resolution normalization method for chip-sequencing data employing multiple experimental control measurements
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Genomics. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Medical Genetics. Uppsala University, Science for Life Laboratory, SciLifeLab.
Show others and affiliations
2012 (English)In: Algorithms for Molecular Biology, ISSN 1748-7188, E-ISSN 1748-7188, Vol. 7, p. 2-Article in journal (Refereed) Published
Abstract [en]

Background: High-throughput sequencing is becoming the standard tool for investigating protein-DNA interactions or epigenetic modifications. However, the data generated will always contain noise due to e. g. repetitive regions or non-specific antibody interactions. The noise will appear in the form of a background distribution of reads that must be taken into account in the downstream analysis, for example when detecting enriched regions (peak-calling). Several reported peak-callers can take experimental measurements of background tag distribution into account when analysing a data set. Unfortunately, the background is only used to adjust peak calling and not as a preprocessing step that aims at discerning the signal from the background noise. A normalization procedure that extracts the signal of interest would be of universal use when investigating genomic patterns.

Results: We formulated such a normalization method based on linear regression and made a proof-of-concept implementation in R and C++. It was tested on simulated as well as on publicly available ChIP-seq data on binding sites for two transcription factors, MAX and FOXA1 and two control samples, Input and IgG. We applied three different peak-callers to (i) raw (un-normalized) data using statistical background models and (ii) raw data with control samples as background and (iii) normalized data without additional control samples as background. The fraction of called regions containing the expected transcription factor binding motif was largest for the normalized data and evaluation with qPCR data for FOXA1 suggested higher sensitivity and specificity using normalized data over raw data with experimental background.

Conclusions: The proposed method can handle several control samples allowing for correction of multiple sources of bias simultaneously. Our evaluation on both synthetic and experimental data suggests that the method is successful in removing background noise.

Place, publisher, year, edition, pages
2012. Vol. 7, p. 2-
National Category
Biochemistry and Molecular Biology
Identifiers
URN: urn:nbn:se:uu:diva-169968DOI: 10.1186/1748-7188-7-2ISI: 000300106500001OAI: oai:DiVA.org:uu-169968DiVA, id: diva2:508390
Available from: 2012-03-08 Created: 2012-03-07 Last updated: 2017-12-07Bibliographically approved

Open Access in DiVA

fulltext(591 kB)318 downloads
File information
File name FULLTEXT01.pdfFile size 591 kBChecksum SHA-512
c9cc82d378c85cf1ecbb7a814c79a862a103ba0a4692a2bbd6982df779ac201fa8ce49e1d63465f03bca7088b21c96b059ea5a39ded805e3d1102223966dc173
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Enroth, StefanAndersson, Claes R.Wadelius, ClaesGustafsson, Mats G.Komorowski, Jan
By organisation
GenomicsThe Linnaeus Centre for BioinformaticsScience for Life Laboratory, SciLifeLabDepartment of Medical SciencesMedical GeneticsComputational and Systems Biology
In the same journal
Algorithms for Molecular Biology
Biochemistry and Molecular Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 318 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 859 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf