Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).
- Google (United States) United States
- University of California, San Francisco United States
- Google Inc United States
- Chan Zuckerberg Initiative (United States) United States
- University of California, Santa Cruz United States
570, Technology, Bioinformatics and Computational Biology, Bioengineering, Medical and Health Sciences, Polymorphism, Single Nucleotide, Article, Nanopores, Genetics, Nanotechnology, 2.1 Biological and endogenous factors, Humans, Polymorphism, Genome, Genome, Human, Human Genome, High-Throughput Nucleotide Sequencing, Molecular Sequence Annotation, Single Nucleotide, DNA, Sequence Analysis, DNA, Biological Sciences, Biological sciences, Genes, Haplotypes, Generic health relevance, Sequence Analysis, Software, Human, Developmental Biology
570, Technology, Bioinformatics and Computational Biology, Bioengineering, Medical and Health Sciences, Polymorphism, Single Nucleotide, Article, Nanopores, Genetics, Nanotechnology, 2.1 Biological and endogenous factors, Humans, Polymorphism, Genome, Genome, Human, Human Genome, High-Throughput Nucleotide Sequencing, Molecular Sequence Annotation, Single Nucleotide, DNA, Sequence Analysis, DNA, Biological Sciences, Biological sciences, Genes, Haplotypes, Generic health relevance, Sequence Analysis, Software, Human, Developmental Biology
13 Research products, page 1 of 2
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
chevron_left - 1
- 2
chevron_right
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).213 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Top 0.1% influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Top 1% impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Top 0.1%
