Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes
Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes
AbstractPhylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require quadratic time or worse in the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available athttps://github.com/DessimozLab/HogProf.
- University College London United Kingdom
- National University of General San Martín Argentina
- National Scientific and Technical Research Council Argentina
- UNIVERSITY COLLEGE LONDON, Bartlett School of Planning United Kingdom
- University of Lausanne Switzerland
QH301-705.5, Global Diversity of Microbial Eukaryotes and Their Evolution, Genetic Structure, Evolutionary biology, Forests, Gene, Database, Genomic Data Integration, Computational biology, Biochemistry, Genetics and Molecular Biology, Sexual reproduction, Genetics, Protein interaction networks, Cluster Analysis, phylogenetic tree, Biology (General), RNA Sequencing Data Analysis, Kinetochores, Molecular Biology, Biology, Cluster Analysis; Computational Biology/methods; Eukaryota/classification; Eukaryota/genetics; Kinetochores/metabolism; Models, Statistical; Phylogeny; Reproduction/genetics, Phylogeny, Phylogenetic network, Phylogenetic analysis, Models, Statistical, Genome, Reproduction, Scalability, Computational Biology, Eukaryota, Life Sciences, Phylogenetic Analysis, Genomics, Computer science, Profiling (computer programming), Analysis of Gene Interaction Networks, Phylogenetics, Operating system, Phylogeography, FOS: Biological sciences, Fungal evolution, Population Genetic Structure and Dynamics, Research Article, Phylogenetic tree
QH301-705.5, Global Diversity of Microbial Eukaryotes and Their Evolution, Genetic Structure, Evolutionary biology, Forests, Gene, Database, Genomic Data Integration, Computational biology, Biochemistry, Genetics and Molecular Biology, Sexual reproduction, Genetics, Protein interaction networks, Cluster Analysis, phylogenetic tree, Biology (General), RNA Sequencing Data Analysis, Kinetochores, Molecular Biology, Biology, Cluster Analysis; Computational Biology/methods; Eukaryota/classification; Eukaryota/genetics; Kinetochores/metabolism; Models, Statistical; Phylogeny; Reproduction/genetics, Phylogeny, Phylogenetic network, Phylogenetic analysis, Models, Statistical, Genome, Reproduction, Scalability, Computational Biology, Eukaryota, Life Sciences, Phylogenetic Analysis, Genomics, Computer science, Profiling (computer programming), Analysis of Gene Interaction Networks, Phylogenetics, Operating system, Phylogeography, FOS: Biological sciences, Fungal evolution, Population Genetic Structure and Dynamics, Research Article, Phylogenetic tree
7 Research products, page 1 of 1
- 2011IsAmongTopNSimilarDocuments
- 2017IsRelatedTo
- 2020IsAmongTopNSimilarDocuments
- 2017IsRelatedTo
- 2011IsAmongTopNSimilarDocuments
- 2017IsRelatedTo
- IsRelatedTo
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).24 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Top 10% influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Top 10% impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Top 10%
