Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
AbstractAs single-cell transcriptomics becomes a mainstream technology, the natural next step is to integrate the accumulating data in order to achieve a common ontology of cell types and states. However, owing to various nuisance factors of variation, it is not straightforward how to compare gene expression levels across data sets and how to automatically assign cell type labels in a new data set based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of cohorts of single-cell RNA-seq data sets, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage any available cell state annotations — for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. We demonstrate that scVI and scANVI compare favorably to the existing methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings such as a hierarchical structure of cell state labels. We further show that different from existing methods, scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as our case study. scVI and scANVI are available as open source software and can be readily used to facilitate cell state annotation and help ensure consistency and reproducibility across studies.
- Massachusetts Institute of Technology United States
- Harvard University United States
- Massachusetts General Hospital United States
- Centre de Mathématiques Appliquées de l'Ecole polytechnique France
- University of Michigan–Ann Arbor United States
Medicine (General), QH301-705.5, Sequence Analysis, RNA, Gene Expression Profiling, Computational Biology, Molecular Sequence Annotation, Articles, differential expression, R5-920, annotation, harmonization, Databases, Genetic, Humans, Supervised Machine Learning, Biology (General), Single-Cell Analysis, scRNA‐seq, variational inference
Medicine (General), QH301-705.5, Sequence Analysis, RNA, Gene Expression Profiling, Computational Biology, Molecular Sequence Annotation, Articles, differential expression, R5-920, annotation, harmonization, Databases, Genetic, Humans, Supervised Machine Learning, Biology (General), Single-Cell Analysis, scRNA‐seq, variational inference
6 Research products, page 1 of 1
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).474 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Top 0.1% influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Top 1% impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Top 0.1%
