descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Nov 2021 English Publisher:Springer Science and Business Media LLCJournal:Nature Methods, volume 18, pages 1,322-1,332 (issn: 1548-7091, eissn: 1548-7105,

Copyright policy )Funded by:NIH | The construction and util..., NIH | Comprehensive, Flexible a..., NIH | The WashU-UCSC-EBI Human ... +1 projects

Authors: Kishwar Shafin; Trevor Pesout; Pi-Chuan Chang; Maria Nattestad; Alexey Kolesnikov; Sidharth Goel; Gunjan Baid; +7 Authors

doi: 10.1038/s41592-021-01299-w

pmid: 34725481

pmc: PMC8571015

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

- Summary
- Subjects
- Related research
  (13)
- Metrics

tips_and_updates
Recommended

Abstract

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).

Related Organizations

Google (United States)
United States
University of California, San Francisco
United States
Google Inc
United States
Chan Zuckerberg Initiative (United States)
United States
University of California, Santa Cruz
United States

Keywords

570, Technology, Bioinformatics and Computational Biology, Bioengineering, Medical and Health Sciences, Polymorphism, Single Nucleotide, Article, Nanopores, Genetics, Nanotechnology, 2.1 Biological and endogenous factors, Humans, Polymorphism, Genome, Genome, Human, Human Genome, High-Throughput Nucleotide Sequencing, Molecular Sequence Annotation, Single Nucleotide, DNA, Sequence Analysis, DNA, Biological Sciences, Biological sciences, Genes, Haplotypes, Generic health relevance, Sequence Analysis, Software, Human, Developmental Biology

13 Research products, page 1 of 2

nanopore_assembly_and_polishing_assessment software on GitHub
IsRelatedTo
shasta software on GitHub
IsRelatedTo
hpp_production_workflows software on GitHub
IsRelatedTo
FlyPN software on GitHub
IsRelatedTo
CHM13 software on GitHub
IsRelatedTo
whatshap software on GitHub
IsRelatedTo
genomics_scripts software on GitHub
IsRelatedTo
pepper software on GitHub
IsRelatedTo
hifiasm software on GitHub
IsRelatedTo
yak software on GitHub
IsRelatedTo

chevron_left
1
2
chevron_right

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	213
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.1%