Digitala Vetenskapliga Arkivet

System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0001-8010-4755
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-3174-2096
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology.ORCID iD: 0000-0003-1842-0882
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-5957-627X
Show others and affiliations
2023 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 51, no 22, p. 114-114Article in journal (Refereed) Published
Abstract [en]

Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.

Place, publisher, year, edition, pages
Oxford University Press (OUP) , 2023. Vol. 51, no 22, p. 114-114
National Category
Genetics and Genomics
Identifiers
URN: urn:nbn:se:kth:diva-341944DOI: 10.1093/nar/gkad1010ISI: 001101836300001Scopus ID: 2-s2.0-85180312128OAI: oai:DiVA.org:kth-341944DiVA, id: diva2:1824757
Note

QC 20240108

Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Exploring human variations by droplet barcoding
Open this publication in new window or tab >>Exploring human variations by droplet barcoding
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Biological variations are being explored at ever-increasing rates through the rapid advancement of analytical techniques. Techniques like massively parallel sequencing empower scientists to accurately differentiate individuals’ genetic compositions, cellular functionalities, and healthy tissue from diseased. The knowledge gained from these techniques brings us ever closer to grasping the complexities of life, contributing to human development. Still, to fully elucidate biological variations in different samples requires novel sensitive and high- throughput techniques, capable of placing everything in its correct context. One such technique gaining promise is droplet barcoding. 

Droplet barcoding leverages emulsion droplets to segregate samples into their functional components, coupled with barcodes that can group tagged molecules following sequencing. This technique constitutes a versatile tool for studying biological variations in both the phenotype and genotype. This thesis leverages droplet barcoding to explore variations relating to human biology. 

Droplet barcoding was used to study phenotype variations, looking at protein compositions in single extracellular vesicles (Paper I) and single cells (Paper II). Paper I studies extracellular vesicles which are naturally released from cells. They carry heterogeneous protein signatures that can inform about their cellular origin. Tens of thousands of extracellular vesicles were profiled, including approximately 25,000 from lung cancer patients. From these protein profiles, extracellular vesicles could be grouped into putative subtypes. Paper II presents a novel method for studying single cells which was used to characterize blood-derived immune cells. The method enabled the identification of most major immune cell lineages. 

Haplotype-resolved genetic variations were analyzed using a linked read sequencing method based on droplet barcoding. Linked-read sequencing conserves long-range information from short-read sequencing by co- barcoding subsections of long DNA fragments. Paper III presents an open-source pipeline (BLR) for whole genome haplotyping using linked reads. BLR generates accurate and continuous haplotypes, outperforming PacBio HiFi-based diploid assembly. We further show that integration with low-coverage long-read data can improve phasing accuracy in tandem repeats. With 10X Genomics linked reads, BLR generated more continuous haplotypes compared to other workflows. Paper IV applies linked read sequencing to reveal the haplotype complexities of cancer genomes. In two patients with colorectal cancer, we identified several large-scale aberrations impacting cancer-related genes. Additionally, several short somatic variants were found to impact nearly all oncogenic networks identified by TCGA. Demonstrating the importance of haplotype-resolved analysis for cancer genomics, one patient exhibited two nonsense mutations on separate haplotypes in the well-known colorectal cancer gene APC. 

Abstract [sv]

Biologiska variationer utforskas i allt snabbare grad, pådrivet av den snabba utvecklingen av analytiska tekniker. Tekniker som massiv parallellsekvensering möjliggör för forskare att noggrant särskilja individers genetiska sammansättningar, cellernas olika funktioner och frisk vävnad från sjuk. Vetskapen dessa tekniker medför ger oss allt djupare insikter om livsformers komplexitet som främjar mänsklig utveckling. Torts dessa framsteg kräver klarläggandet av biologiska variationer i olika prover nya känsliga tekniker med hög kapacitet, kapabla att placera information i dess rätta sammanhang. En särskilt lovande teknik är droppkodning. 

Droppkodning utnyttjar emulsionsdropparnas förmåga att separera prover i dess funktionella komponenter kombinerat med DNA-koder för att gruppera märkta molekyler efter sekvensering. Denna teknik utgör ett mångsidigt verktyg för att studera biologiska variationer i både fenotyp och genotyp. Den här avhandlingen utforskar tekniker baserat på droppkodning för att analysera dessa variationer relaterat till människlig biologi. 

Droppkodning användes i analys av fenotypvariationer genom att studera proteinsignaturer hos enskilda extracellulära vesiklar (Artikel I) samt enskilda celler (Artikel II). Artikel I studerar extracellulära vesiklar, vilka är partiklar som naturligt släpps ut från celler. Dessa vesiklar bär på heterogena protein-signaturer som kan informera om dess cellulära härkomst. I studien undersöks proteinsignaturer från tiotusentals extracellulära vesiklar, inklusive cirka 25 000 från lungcancerpatienter. Utifrån dessa signaturer kunde extracellulära vesiklar sedan grupperas i potentiella subtyper. Artikel II presenterar en ny metod för att studera enskilda celler, som användes för att karakterisera immunceller från blod. Metoden möjliggjorde identifiering av de flesta stora immuncellspopulationerna. 

Haplotyp-upplösta genotypvariationer analyserades med en metod för länkad sekvensering baserad på droppkodning. Länkad sekvensering möjliggör att vid sekvensering med kort läslängd bevara information över långa genomiska distanser genom DNA-kodning av små delar av långa DNA-fragment. Artikel III presenterar en pipeline (BLR) med öppen källkod för helgenoms haplotypning som använder data från länkad sekvensering. BLR genererar haplotyper med stor exakthet och kontinuitet som överträffar diploid genom-sammansättning (“assembly”) med PacBio HiFi data. Vi visar även att integrering med långa sekvenser med begränsad genomtäckning förbättra haplotypning i tandem-repetitiva genomregioner. Med 10X Genomics länkade sekvenser genererade BLR mer kontinuerlig haplotypning jämfört med andra analysflöden. Artikel IV tillämpar länkad sekvensering för att avslöja haplotypkomplexiteten hos cancergenom. Hos två patienter med tjocktarmscancer identifierades flera storskaliga variationer som överlappar cancerrelaterade gener. Dessutom hittades flera korta somatiska varianter som påverkade gener i nästan all onkogena nätverk identifierade av TCGA. En patient uppvisade två nonsensmutationer på separata haplotyper i den välkända tjocktarmscancergenen APC, vilket påvisar vikten av haplotyp-upplöst analys för cancergenomik. 

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2024. p. 99
Series
TRITA-CBH-FOU ; 2024:7
Keywords
droplets, linked-read sequencing, DNA barcoding, proteomics, genomics, single cell, single extracellular vesicle, single exosome, pipelines
National Category
Bioinformatics and Computational Biology Cell Biology Genetics and Genomics
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-343460 (URN)978-91-8040-840-0 (ISBN)
Public defence
2024-03-15, Inghesalen, Widerströmska huset, Tomtebodavägen 18a, via Zoom: https://kth-se.zoom.us/j/69346261396, Solna, 10:00 (English)
Opponent
Supervisors
Note

QC 2024-02-15

Available from: 2024-02-15 Created: 2024-02-14 Last updated: 2025-02-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Höjer, PontusFrick, TobiasSiga, HumamPourbozorgi, ParhamAghelpasand, HoomanAhmadian, Afshin
By organisation
Gene TechnologyScience for Life Laboratory, SciLifeLab
In the same journal
Nucleic Acids Research
Genetics and Genomics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 78 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf