Mapping single-cell data to reference atlases by transfer learning
Mapping single-cell data to reference atlases by transfer learning
AbstractLarge single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
- University of California System United States
- Northwestern University United States
- University of Münster Germany
- Northeastern University United States
- Harvard University United States
SARS-CoV-2, 1.1 Normal biological development and functioning, Data Science, Bioinformatics and Computational Biology, Neurosciences, 500, 610, COVID-19, Datasets as Topic, Biological Sciences, Reference Standards, Mice, Deep Learning, Networking and Information Technology R&D (NITRD), Organ Specificity, Machine Learning and Artificial Intelligence, Animals, Humans, Single-Cell Analysis, Analysis, ddc: ddc:
SARS-CoV-2, 1.1 Normal biological development and functioning, Data Science, Bioinformatics and Computational Biology, Neurosciences, 500, 610, COVID-19, Datasets as Topic, Biological Sciences, Reference Standards, Mice, Deep Learning, Networking and Information Technology R&D (NITRD), Organ Specificity, Machine Learning and Artificial Intelligence, Animals, Humans, Single-Cell Analysis, Analysis, ddc: ddc:
8 Research products, page 1 of 1
- 2022IsAmongTopNSimilarDocuments
- 2022IsAmongTopNSimilarDocuments
- 2020IsAmongTopNSimilarDocuments
- 2020IsAmongTopNSimilarDocuments
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
- IsRelatedTo
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).433 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Top 0.1% influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Top 1% impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Top 0.01%
